Loading content...
Automated feature engineering is a double-edged sword. DFS can generate thousands of candidate features in minutes—far more than any human could manually create. But more features isn't always better.
Too many features leads to:
The fundamental challenge:
From thousands of auto-generated features, how do we identify the subset that maximizes predictive power while minimizing complexity?
This page equips you with systematic methods to evaluate feature quality, detect redundancy, and select optimal feature subsets for your models.
By the end of this page, you will understand: univariate feature importance metrics, model-based feature selection methods, redundancy detection through correlation analysis, the feature selection taxonomy (filter, wrapper, embedded), and practical workflows for reducing feature dimensionality.
Univariate methods evaluate each feature independently, measuring its individual relationship with the target variable. These are the fastest evaluation methods—O(n) complexity for n features.
| Task Type | Feature Type | Target Type | Test |
|---|---|---|---|
| Classification | Numeric | Categorical | ANOVA F-test |
| Classification | Categorical | Categorical | Chi-squared test |
| Regression | Numeric | Numeric | Pearson correlation |
| Regression | Categorical | Numeric | Mutual information |
1234567891011121314151617181920212223242526272829303132333435363738394041
import numpy as npimport pandas as pdfrom sklearn.feature_selection import ( SelectKBest, SelectPercentile, f_classif, chi2, mutual_info_classif, f_regression, mutual_info_regression)from sklearn.preprocessing import MinMaxScaler # Assume feature_matrix is from DFS and y is targetX = feature_matrix.fillna(0) # Handle nullsy = labels # Binary classification target # ANOVA F-test for classification (numeric features)selector_f = SelectKBest(score_func=f_classif, k=50)selector_f.fit(X, y) # Get feature scoresf_scores = pd.DataFrame({ 'feature': X.columns, 'f_score': selector_f.scores_, 'p_value': selector_f.pvalues_}).sort_values('f_score', ascending=False) print("Top 10 Features by F-score:")print(f_scores.head(10)) # Mutual Information (captures nonlinear relationships)mi_scores = mutual_info_classif(X, y, random_state=42)mi_df = pd.DataFrame({ 'feature': X.columns, 'mi_score': mi_scores}).sort_values('mi_score', ascending=False) print("\nTop 10 Features by Mutual Information:")print(mi_df.head(10)) # Compare rankingstop_f = set(f_scores.head(50)['feature'])top_mi = set(mi_df.head(50)['feature'])print(f"\nOverlap in top 50: {len(top_f & top_mi)} features")F-statistic (ANOVA)
Mutual Information
Chi-squared
Pearson Correlation (regression)
Univariate methods ignore feature interactions. A feature with low individual importance might be highly predictive when combined with others (XOR-like patterns). Always complement univariate with multivariate evaluation for robust selection.
Model-based methods use trained models to assess feature importance, capturing both individual effects and interactions.
Decision trees and ensembles (Random Forest, XGBoost, LightGBM) provide built-in feature importance:
| Metric | Description | Best For |
|---|---|---|
| Gini Importance | Mean decrease in impurity | Fast, built-in |
| Split Count | Number of times feature is used | Understanding coverage |
| Gain | Total gain across all splits | Gradient boosting models |
| Permutation | Drop in score when shuffled | Model-agnostic, reliable |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
import numpy as npimport pandas as pdfrom sklearn.ensemble import RandomForestClassifierfrom sklearn.inspection import permutation_importancefrom sklearn.model_selection import train_test_splitimport lightgbm as lgb # Split dataX_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42) # Train Random Forestrf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1)rf.fit(X_train, y_train) # Method 1: Gini Importance (built-in)gini_importance = pd.DataFrame({ 'feature': X.columns, 'importance': rf.feature_importances_}).sort_values('importance', ascending=False) print("Top 10 Features by Gini Importance:")print(gini_importance.head(10)) # Method 2: Permutation Importance (more reliable)perm_importance = permutation_importance( rf, X_test, y_test, n_repeats=10, random_state=42, n_jobs=-1) perm_df = pd.DataFrame({ 'feature': X.columns, 'importance_mean': perm_importance.importances_mean, 'importance_std': perm_importance.importances_std}).sort_values('importance_mean', ascending=False) print("\nTop 10 Features by Permutation Importance:")print(perm_df.head(10)) # Method 3: LightGBM Gain-based importancelgb_model = lgb.LGBMClassifier(n_estimators=100, random_state=42)lgb_model.fit(X_train, y_train) lgb_importance = pd.DataFrame({ 'feature': X.columns, 'gain': lgb_model.booster_.feature_importance(importance_type='gain'), 'split': lgb_model.booster_.feature_importance(importance_type='split')}).sort_values('gain', ascending=False) print("\nTop 10 Features by LightGBM Gain:")print(lgb_importance.head(10))SHAP (SHapley Additive exPlanations) values provide theoretically grounded, consistent feature importance with several advantages:
1234567891011121314151617181920212223242526272829303132333435363738
import shapimport matplotlib.pyplot as plt # Create SHAP explainer (use TreeExplainer for tree models)explainer = shap.TreeExplainer(lgb_model)shap_values = explainer.shap_values(X_test) # For binary classification, shap_values is list of 2 arrays# Use class 1 (positive class) for importanceif isinstance(shap_values, list): shap_vals = shap_values[1]else: shap_vals = shap_values # Global feature importance (mean absolute SHAP)shap_importance = pd.DataFrame({ 'feature': X.columns, 'mean_abs_shap': np.abs(shap_vals).mean(axis=0)}).sort_values('mean_abs_shap', ascending=False) print("Top 10 Features by SHAP Importance:")print(shap_importance.head(10)) # Summary plot (beeswarm)plt.figure(figsize=(10, 8))shap.summary_plot(shap_vals, X_test, plot_type="bar", max_display=20)plt.title("SHAP Feature Importance")plt.tight_layout()plt.savefig("shap_importance.png") # Compare all methodscomparison = gini_importance.head(20).merge( perm_df.head(20), on='feature', how='outer').merge( shap_importance.head(20), on='feature', how='outer')print("\nImportance Comparison (top 20 from each method):")print(comparison)Different importance methods often disagree on feature rankings. This isn't necessarily a problem—each method measures a different aspect of importance. When methods agree, you have high confidence. When they disagree, investigate why and consider keeping features that rank high in ANY method.
Automated feature engineering often produces highly correlated features—different paths to the same information. Redundancy detection identifies and removes these near-duplicates.
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485
import numpy as npimport pandas as pdfrom scipy import statsfrom collections import defaultdict def detect_redundant_features(X, threshold=0.95, method='pearson'): """ Identify highly correlated feature pairs. Args: X: Feature DataFrame threshold: Correlation threshold (default 0.95) method: 'pearson', 'spearman', or 'kendall' Returns: List of (feature1, feature2, correlation) tuples """ # Compute correlation matrix corr_matrix = X.corr(method=method) # Extract upper triangle (avoid duplicates) upper_tri = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1) # Find pairs above threshold redundant_pairs = [] for i, j in zip(*np.where(upper_tri)): if abs(corr_matrix.iloc[i, j]) >= threshold: redundant_pairs.append(( corr_matrix.columns[i], corr_matrix.columns[j], corr_matrix.iloc[i, j] )) return sorted(redundant_pairs, key=lambda x: abs(x[2]), reverse=True) # Find redundant featuresredundant = detect_redundant_features(X, threshold=0.95)print(f"Found {len(redundant)} highly correlated pairs (r >= 0.95):")for f1, f2, corr in redundant[:10]: print(f" {corr:.3f}: {f1} ↔ {f2}") def remove_redundant_features(X, threshold=0.95, importance_scores=None): """ Remove redundant features, keeping the more important one. Args: X: Feature DataFrame threshold: Correlation threshold importance_scores: Dict of feature -> importance (optional) Returns: List of features to keep """ corr_matrix = X.corr().abs() features_to_drop = set() for i in range(len(corr_matrix.columns)): for j in range(i + 1, len(corr_matrix.columns)): if corr_matrix.iloc[i, j] >= threshold: col_i = corr_matrix.columns[i] col_j = corr_matrix.columns[j] # Drop the less important feature if importance_scores: drop = col_i if importance_scores.get(col_i, 0) < \ importance_scores.get(col_j, 0) else col_j else: # Without importance, drop the second one drop = col_j features_to_drop.add(drop) features_to_keep = [c for c in X.columns if c not in features_to_drop] return features_to_keep # Create importance dict from earlier analysisimportance_dict = dict(zip(shap_importance['feature'], shap_importance['mean_abs_shap'])) features_to_keep = remove_redundant_features( X, threshold=0.95, importance_scores=importance_dict)print(f"\nOriginal features: {len(X.columns)}")print(f"After redundancy removal: {len(features_to_keep)}")print(f"Features removed: {len(X.columns) - len(features_to_keep)}")When many features are inter-correlated, it's useful to identify feature clusters—groups of features that essentially measure the same underlying concept:
from scipy.cluster.hierarchy import linkage, fcluster
from scipy.spatial.distance import squareform
# Create distance matrix from correlation
corr = X.corr().abs()
distance_matrix = 1 - corr
# Hierarchical clustering
linkage_matrix = linkage(
squareform(distance_matrix), method='average'
)
# Cut tree at threshold to get clusters
clusters = fcluster(linkage_matrix, t=0.3, criterion='distance')
# Group features by cluster
feature_clusters = defaultdict(list)
for feat, cluster_id in zip(X.columns, clusters):
feature_clusters[cluster_id].append(feat)
# From each cluster, keep only the most important feature
representative_features = []
for cluster_id, features in feature_clusters.items():
best_feat = max(features, key=lambda f: importance_dict.get(f, 0))
representative_features.append(best_feat)
DFS often creates naturally redundant features. For example, SUM(orders.total_amount) and MEAN(orders.total_amount) × COUNT(orders) are perfectly correlated. Similarly, features at different depths may capture the same information. Redundancy removal is almost always necessary after DFS.
Feature selection methods fall into three categories based on how they interact with the learning algorithm:
Definition: Evaluate features independently of any machine learning model.
| Method | How it Works | Speed | Captures Interactions |
|---|---|---|---|
| Variance threshold | Remove near-constant features | Very fast | No |
| Univariate tests | Statistical tests per feature | Fast | No |
| Correlation filter | Remove highly correlated pairs | Fast | No |
| Mutual information | Information-theoretic measure | Moderate | Partially |
1234567891011121314151617181920212223242526
from sklearn.feature_selection import ( VarianceThreshold, SelectKBest, f_classif, mutual_info_classif) # Step 1: Remove constant/near-constant featuresvariance_selector = VarianceThreshold(threshold=0.01)X_var = variance_selector.fit_transform(X)kept_cols = X.columns[variance_selector.get_support()]print(f"After variance filter: {len(kept_cols)} features") # Step 2: Remove highly correlated featuresX_filtered = X[kept_cols]corr_matrix = X_filtered.corr().abs()upper_tri = np.triu(np.ones_like(corr_matrix, dtype=bool), k=1)to_drop = [col for col in X_filtered.columns if any(corr_matrix.loc[col, upper_tri[X_filtered.columns.get_loc(col)]] > 0.95)]X_decorr = X_filtered.drop(columns=to_drop)print(f"After correlation filter: {len(X_decorr.columns)} features") # Step 3: Select top k by univariate testk = min(100, len(X_decorr.columns))univariate_selector = SelectKBest(score_func=f_classif, k=k)X_univariate = univariate_selector.fit_transform(X_decorr, y)final_cols = X_decorr.columns[univariate_selector.get_support()]print(f"After univariate filter: {len(final_cols)} features")Definition: Use model performance to evaluate feature subsets.
| Method | How it Works | Speed | Optimal Guarantee |
|---|---|---|---|
| Forward selection | Add features greedily | Slow | No |
| Backward elimination | Remove features greedily | Slow | No |
| Recursive Feature Elimination | Remove least important iteratively | Moderate | No |
| Exhaustive search | Try all combinations | Very slow | Yes |
123456789101112131415161718192021222324252627282930
from sklearn.feature_selection import RFE, RFECVfrom sklearn.linear_model import LogisticRegressionfrom sklearn.model_selection import cross_val_score # Recursive Feature Elimination with Cross-Validationestimator = LogisticRegression(max_iter=1000, random_state=42) # RFECV automatically finds optimal number of featuresrfecv = RFECV( estimator=estimator, step=0.1, # Remove 10% of features each step cv=5, # 5-fold cross-validation scoring='roc_auc', min_features_to_select=10, n_jobs=-1) # Note: Wrapper methods are slow with many features# Consider pre-filtering with filter methods firstX_prefiltered = X[final_cols] # From filter steprfecv.fit(X_prefiltered, y) print(f"Optimal number of features: {rfecv.n_features_}")print(f"Best CV score: {rfecv.cv_results_['mean_test_score'].max():.4f}") # Get selected featuresrfe_selected = X_prefiltered.columns[rfecv.support_]print(f"\nSelected Features ({len(rfe_selected)}):")for feat in rfe_selected[:10]: print(f" {feat}")Definition: Feature selection is built into the model training process.
| Method | How it Works | Speed | Notes |
|---|---|---|---|
| L1 Regularization (Lasso) | Drives coefficients to zero | Fast | Linear models only |
| ElasticNet | Combines L1 and L2 | Fast | Handles correlation |
| Tree importance + threshold | Built-in importance scores | Fast | Tree models only |
| Feature importance from boosting | Iterative importance | Moderate | Very effective |
123456789101112131415161718192021222324252627282930313233
from sklearn.linear_model import LassoCV, ElasticNetCVfrom sklearn.feature_selection import SelectFromModelfrom sklearn.preprocessing import StandardScalerfrom sklearn.pipeline import Pipeline # L1 (Lasso) for embedded feature selectionpipeline = Pipeline([ ('scaler', StandardScaler()), ('lasso', LassoCV(cv=5, random_state=42, max_iter=2000))]) pipeline.fit(X_prefiltered, y)lasso_model = pipeline.named_steps['lasso'] # Features with non-zero coefficientslasso_coefs = pd.DataFrame({ 'feature': X_prefiltered.columns, 'coefficient': lasso_model.coef_})lasso_selected = lasso_coefs[lasso_coefs['coefficient'] != 0]print(f"Lasso selected {len(lasso_selected)} features") # Tree-based embedded selectionfrom sklearn.ensemble import GradientBoostingClassifier gbc = GradientBoostingClassifier(n_estimators=100, random_state=42)gbc.fit(X_prefiltered, y) # Select features above median importanceselector = SelectFromModel(gbc, threshold='median', prefit=True)X_embedded = selector.transform(X_prefiltered)embedded_selected = X_prefiltered.columns[selector.get_support()]print(f"\nGBM embedded selection: {len(embedded_selected)} features")A critical but often overlooked aspect of feature selection is stability—do the same features get selected when you slightly perturb the data?
With correlated features or noisy data, feature selection can be unstable:
This instability indicates:
Repeatedly subsample data, run selection, and count how often each feature is selected:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172
import numpy as npfrom sklearn.linear_model import LassoCVfrom sklearn.preprocessing import StandardScalerfrom collections import Counter def stability_selection(X, y, n_iterations=100, sample_fraction=0.5, threshold=0.5, random_state=42): """ Perform stability selection to identify robust features. Args: X: Feature DataFrame y: Target variable n_iterations: Number of bootstrap iterations sample_fraction: Fraction of data to use per iteration threshold: Minimum selection frequency to include feature random_state: Random seed Returns: DataFrame with selection frequencies """ np.random.seed(random_state) n_samples = len(X) sample_size = int(n_samples * sample_fraction) selection_counts = Counter() scaler = StandardScaler() for i in range(n_iterations): # Bootstrap sample indices = np.random.choice(n_samples, sample_size, replace=False) X_sample = X.iloc[indices] y_sample = y.iloc[indices] if hasattr(y, 'iloc') else y[indices] # Scale features X_scaled = scaler.fit_transform(X_sample) # Fit Lasso with cross-validation lasso = LassoCV(cv=3, random_state=i, max_iter=2000) lasso.fit(X_scaled, y_sample) # Record selected features (non-zero coefficients) selected = X.columns[lasso.coef_ != 0] selection_counts.update(selected) if (i + 1) % 20 == 0: print(f"Completed {i + 1}/{n_iterations} iterations") # Calculate selection frequencies frequencies = pd.DataFrame({ 'feature': list(selection_counts.keys()), 'count': list(selection_counts.values()) }) frequencies['frequency'] = frequencies['count'] / n_iterations frequencies = frequencies.sort_values('frequency', ascending=False) # Apply threshold stable_features = frequencies[frequencies['frequency'] >= threshold] return frequencies, stable_features['feature'].tolist() # Run stability selectionfrequencies, stable_features = stability_selection( X_prefiltered, y, n_iterations=100, sample_fraction=0.5, threshold=0.5) print(f"Stable features (selected >50% of time): {len(stable_features)}")print("\nTop 15 most stable features:")print(frequencies.head(15))Features with high stability scores (>70%) are reliably important across data variations. These should form the core of your feature set. Features with low stability (<30%) may be artifacts of specific samples and should be treated skeptically.
Now let's put it all together in a systematic pipeline for evaluating and selecting features from DFS output:
┌─────────────────────────┐
│ DFS Output (1000s) │
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ 1. Remove Constants │ Fast filter
│ (variance < 0.01) │ (~30% reduction)
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ 2. Remove Redundant │ Correlation filter
│ (correlation > 0.95) │ (~40% reduction)
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ 3. Univariate Filter │ Top N by MI or F-test
│ (keep top 500) │ (Controlled reduction)
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ 4. Model Importance │ SHAP or permutation
│ (rank features) │ (Quality ranking)
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ 5. Stability Selection │ Multiple subsamples
│ (keep stable >50%) │ (Robustness filter)
└───────────┬─────────────┘
▼
┌─────────────────────────┐
│ Final Feature Set │
│ (50-100 features) │
└─────────────────────────┘
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144
import pandas as pdimport numpy as npfrom sklearn.feature_selection import VarianceThreshold, SelectKBest, mutual_info_classiffrom sklearn.ensemble import RandomForestClassifierfrom sklearn.inspection import permutation_importancefrom sklearn.model_selection import train_test_split class FeatureEvaluationPipeline: """ Systematic pipeline for evaluating and selecting DFS features. """ def __init__(self, variance_threshold=0.01, correlation_threshold=0.95, univariate_k=500, stability_iterations=50, stability_threshold=0.5): self.variance_threshold = variance_threshold self.correlation_threshold = correlation_threshold self.univariate_k = univariate_k self.stability_iterations = stability_iterations self.stability_threshold = stability_threshold self.selected_features_ = None self.evaluation_report_ = {} def fit(self, X, y, verbose=True): """ Run the complete evaluation pipeline. """ original_count = len(X.columns) # Step 1: Variance filter X_current = self._variance_filter(X) self.evaluation_report_['after_variance'] = len(X_current.columns) if verbose: print(f"[1/5] Variance filter: {original_count} → {len(X_current.columns)}") # Step 2: Correlation filter X_current = self._correlation_filter(X_current) self.evaluation_report_['after_correlation'] = len(X_current.columns) if verbose: print(f"[2/5] Correlation filter: → {len(X_current.columns)}") # Step 3: Univariate filter X_current, univariate_scores = self._univariate_filter(X_current, y) self.evaluation_report_['after_univariate'] = len(X_current.columns) self.evaluation_report_['univariate_scores'] = univariate_scores if verbose: print(f"[3/5] Univariate filter: → {len(X_current.columns)}") # Step 4: Model importance importance_df = self._model_importance(X_current, y) self.evaluation_report_['importance'] = importance_df if verbose: print(f"[4/5] Model importance calculated") # Step 5: Stability selection stable_features = self._stability_selection(X_current, y) self.evaluation_report_['stable_features'] = stable_features self.selected_features_ = stable_features if verbose: print(f"[5/5] Stability selection: → {len(stable_features)} final features") return self def _variance_filter(self, X): X_filled = X.fillna(X.median()) selector = VarianceThreshold(threshold=self.variance_threshold) selector.fit(X_filled) return X[X.columns[selector.get_support()]] def _correlation_filter(self, X): X_filled = X.fillna(X.median()) corr = X_filled.corr().abs() upper = np.triu(np.ones_like(corr, dtype=bool), k=1) to_drop = set() for i in range(len(corr.columns)): for j in range(i + 1, len(corr.columns)): if corr.iloc[i, j] > self.correlation_threshold: # Drop the one with higher mean correlation mean_i = corr.iloc[i].mean() mean_j = corr.iloc[j].mean() to_drop.add(corr.columns[j if mean_j > mean_i else i]) return X.drop(columns=list(to_drop)) def _univariate_filter(self, X, y): X_filled = X.fillna(X.median()) k = min(self.univariate_k, len(X.columns)) selector = SelectKBest(score_func=mutual_info_classif, k=k) selector.fit(X_filled, y) scores = pd.DataFrame({ 'feature': X.columns, 'mi_score': selector.scores_ }).sort_values('mi_score', ascending=False) return X[X.columns[selector.get_support()]], scores def _model_importance(self, X, y): X_filled = X.fillna(X.median()) X_train, X_test, y_train, y_test = train_test_split( X_filled, y, test_size=0.2, random_state=42 ) rf = RandomForestClassifier(n_estimators=100, random_state=42, n_jobs=-1) rf.fit(X_train, y_train) perm = permutation_importance(rf, X_test, y_test, n_repeats=10, random_state=42, n_jobs=-1) return pd.DataFrame({ 'feature': X.columns, 'importance': perm.importances_mean }).sort_values('importance', ascending=False) def _stability_selection(self, X, y): # Simplified stability selection from collections import Counter from sklearn.linear_model import LassoCV from sklearn.preprocessing import StandardScaler X_filled = X.fillna(X.median()) counts = Counter() for i in range(self.stability_iterations): idx = np.random.choice(len(X), int(0.5 * len(X)), replace=False) X_sub = X_filled.iloc[idx] y_sub = y.iloc[idx] if hasattr(y, 'iloc') else y[idx] X_scaled = StandardScaler().fit_transform(X_sub) lasso = LassoCV(cv=3, random_state=i, max_iter=2000) lasso.fit(X_scaled, y_sub) selected = X.columns[lasso.coef_ != 0] counts.update(selected) freq = {f: c / self.stability_iterations for f, c in counts.items()} stable = [f for f, p in freq.items() if p >= self.stability_threshold] return stable def transform(self, X): return X[self.selected_features_] def get_report(self): return self.evaluation_report_ # Usagepipeline = FeatureEvaluationPipeline()pipeline.fit(feature_matrix.fillna(0), y)X_selected = pipeline.transform(feature_matrix)print(f"\nFinal feature set: {X_selected.shape[1]} features")Beyond individual feature quality, we need to evaluate feature sets as a whole:
The ultimate test—does the feature set improve model accuracy?
| Metric | Task | Interpretation |
|---|---|---|
| ROC-AUC | Binary classification | Ranking quality |
| Log Loss | Classification | Probability calibration |
| RMSE | Regression | Error magnitude |
| Lift @ K | Ranking | Top-K performance |
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152
from sklearn.model_selection import cross_val_scorefrom sklearn.ensemble import GradientBoostingClassifierfrom sklearn.metrics import roc_auc_score, log_lossimport time def evaluate_feature_set(X, y, feature_subset, cv=5): """ Evaluate a feature set on multiple criteria. """ X_subset = X[feature_subset].fillna(0) # Model performance model = GradientBoostingClassifier(n_estimators=100, random_state=42) auc_scores = cross_val_score( model, X_subset, y, cv=cv, scoring='roc_auc', n_jobs=-1 ) # Training time start = time.time() model.fit(X_subset, y) train_time = time.time() - start # Feature efficiency features_per_auc_point = len(feature_subset) / auc_scores.mean() return { 'n_features': len(feature_subset), 'auc_mean': auc_scores.mean(), 'auc_std': auc_scores.std(), 'train_time_seconds': train_time, 'features_per_auc_point': features_per_auc_point } # Compare feature setsfeature_sets = { 'all_features': list(X.columns), 'top_100_importance': importance_df.head(100)['feature'].tolist(), 'stable_features': stable_features, 'pipeline_output': pipeline.selected_features_} print("Feature Set Comparison:")print("-" * 70)for name, features in feature_sets.items(): if len(features) > 0: metrics = evaluate_feature_set(X, y, features) print(f"\n{name}:") print(f" Features: {metrics['n_features']}") print(f" AUC: {metrics['auc_mean']:.4f} ± {metrics['auc_std']:.4f}") print(f" Train time: {metrics['train_time_seconds']:.2f}s") print(f" Efficiency: {metrics['features_per_auc_point']:.1f} features/AUC point")| Metric | Formula | Interpretation |
|---|---|---|
| Feature Efficiency | AUC / log(n_features) | Performance per complexity |
| Stability Score | Mean selection frequency | Robustness across samples |
| Redundancy Score | Mean pairwise correlation | Information overlap |
| Coverage Score | Unique entities represented | Schema coverage |
Feature evaluation transforms the raw output of automated feature engineering into a curated, high-quality feature set. Let's consolidate the key methods:
What's Next:
With features generated and evaluated, the final challenge is computational efficiency. The next page covers computational considerations—strategies for scaling automated feature engineering to large datasets while managing memory and processing time constraints.
You now have a complete toolkit for feature evaluation and selection. From univariate statistics to stability selection, you can systematically reduce DFS output to a curated set that maximizes predictive power while minimizing complexity.