Loading learning content...
Consider predicting house prices. Square footage matters. Number of bedrooms matters. But neither alone captures price per square foot varies by bedroom count—a 2000 sq ft house with 2 bedrooms is valued differently than a 2000 sq ft house with 5 bedrooms. The interaction between these features creates new information.
Or consider fraud detection: transaction amount alone isn't suspicious. Night-time transactions alone aren't suspicious. But a large transaction at 3 AM from an account that never makes night purchases? That combination is a red flag that neither feature signals independently.
This is the power of interaction features: they capture relationships where the effect of one variable depends on the value of another. Without them, linear models are blind to these patterns, and even tree-based models may struggle to discover them efficiently.
This page covers the theory and practice of interaction features. You'll learn why interactions matter mathematically, how to identify candidate interactions, implementation strategies for different feature types, and techniques for managing the combinatorial explosion when features multiply.
The Mathematical Foundation:
Linear models express predictions as:
$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$$
This assumes additivity: the effect of increasing $x_1$ is the same regardless of $x_2$'s value. Adding an interaction term breaks this assumption:
$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2$$
Now the effect of $x_1$ depends on $x_2$: $\frac{\partial \hat{y}}{\partial x_1} = \beta_1 + \beta_{12} x_2$
This captures:
| Interaction Type | Description | Example |
|---|---|---|
| Synergistic | Combined effect exceeds sum of individual effects | Exercise + Diet together reduce weight more than each alone |
| Antagonistic | Combined effect is less than sum (diminishing returns) | Advertising + Price discount—both attract buyers, but overlap exists |
| Threshold | Effect only appears when both conditions met | Education level only matters for high-income job applications |
| Modifying | One feature changes the direction/magnitude of another | Age modifies the effect of exercise on heart health |
| Necessary | Effect requires presence of both features | Key + Lock: neither works alone, together they open doors |
Linear models CANNOT learn interactions without explicit features. Tree-based models CAN learn interactions through sequential splitting (first split on X₁, then on X₂), but may need many splits to approximate smooth interaction surfaces. Neural networks can learn interactions in hidden layers. Explicitly creating interaction features often helps all model types—trees include them more easily, and neural networks converge faster.
Interaction features come in several forms depending on the types of features being combined.
Numerical × Numerical Interactions:
The simplest case—multiply two numerical features:
height × weight → body mass proxyprice × quantity → revenueyears_experience × education_level → human capital proxydistance × time (inverse) → speedCategorical × Categorical Interactions:
Create new category from combination:
{gender} × {age_group} → {male_25-34, female_35-44, ...}{product_category} × {day_of_week} → captures category-specific weekly patterns{city} × {weather} → location-weather combinationsNumerical × Categorical Interactions:
Create category-specific versions of numerical features:
income × gender → separate income effects by genderage × product_type → age effects vary by productprice_sensitivity × customer_segment → segment-specific price response123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126
import pandas as pdimport numpy as npfrom sklearn.preprocessing import PolynomialFeaturesfrom itertools import combinations def create_numerical_interactions( df: pd.DataFrame, numeric_cols: list, interaction_type: str = 'multiplicative') -> pd.DataFrame: """ Create interaction features between numerical columns. Parameters: ----------- interaction_type: 'multiplicative', 'ratio', 'difference', or 'all' """ interactions = pd.DataFrame(index=df.index) for col1, col2 in combinations(numeric_cols, 2): if interaction_type in ['multiplicative', 'all']: interactions[f'{col1}_x_{col2}'] = df[col1] * df[col2] if interaction_type in ['ratio', 'all']: # Avoid division by zero interactions[f'{col1}_div_{col2}'] = df[col1] / (df[col2] + 1e-8) interactions[f'{col2}_div_{col1}'] = df[col2] / (df[col1] + 1e-8) if interaction_type in ['difference', 'all']: interactions[f'{col1}_minus_{col2}'] = df[col1] - df[col2] return interactions def create_categorical_interactions( df: pd.DataFrame, cat_cols: list, max_cardinality: int = 100) -> pd.DataFrame: """ Create interaction features between categorical columns. Limits output cardinality to prevent explosion. """ interactions = pd.DataFrame(index=df.index) for col1, col2 in combinations(cat_cols, 2): combined = df[col1].astype(str) + '_' + df[col2].astype(str) # Check cardinality if combined.nunique() <= max_cardinality: interactions[f'{col1}_x_{col2}'] = combined else: # Keep only most frequent combinations top_values = combined.value_counts().head(max_cardinality).index interactions[f'{col1}_x_{col2}'] = combined.where( combined.isin(top_values), 'other' ) return interactions def create_num_cat_interactions( df: pd.DataFrame, num_cols: list, cat_cols: list) -> pd.DataFrame: """ Create numerical × categorical interactions. For each category, creates a version of the numerical feature. """ interactions = pd.DataFrame(index=df.index) for num_col in num_cols: for cat_col in cat_cols: # One-hot style: separate column per category for category in df[cat_col].unique(): mask = df[cat_col] == category col_name = f'{num_col}_when_{cat_col}_{category}' interactions[col_name] = df[num_col].where(mask, 0) return interactions # Using sklearn's PolynomialFeatures for exhaustive interactionsdef polynomial_interactions( df: pd.DataFrame, cols: list, degree: int = 2, include_bias: bool = False, interaction_only: bool = True) -> pd.DataFrame: """ Create polynomial interaction features using sklearn. interaction_only=True excludes squared terms (x1², x2²) """ poly = PolynomialFeatures( degree=degree, include_bias=include_bias, interaction_only=interaction_only ) X = df[cols].values X_poly = poly.fit_transform(X) feature_names = poly.get_feature_names_out(cols) return pd.DataFrame(X_poly, index=df.index, columns=feature_names) # Example usagedf = pd.DataFrame({ 'height': [170, 165, 180, 175, 160], 'weight': [70, 55, 85, 80, 50], 'age': [25, 30, 35, 40, 45], 'gender': ['M', 'F', 'M', 'M', 'F'], 'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC']}) # Create all types of interactionsnum_ints = create_numerical_interactions(df, ['height', 'weight', 'age'], 'all')cat_ints = create_categorical_interactions(df, ['gender', 'city'])mixed_ints = create_num_cat_interactions(df, ['height', 'weight'], ['gender']) print("Numerical Interactions:")print(num_ints.head())With n features, there are n(n-1)/2 pairwise interactions, and the number explodes for higher-order combinations. Most interactions are noise. How do you identify the valuable ones?
Domain-Guided Selection:
The most reliable approach: use domain knowledge to hypothesize interactions.
| Domain | Interaction Hypothesis | Rationale |
|---|---|---|
| E-commerce | price × brand_tier | Premium brands may be less price-sensitive |
| Credit | income × debt | High income with high debt is different than high income alone |
| Healthcare | age × medication_count | Polypharmacy effects increase with age |
| Marketing | channel × time_of_day | Email works differently than SMS by time |
| Real estate | sqft × neighborhood | Price per sqft varies dramatically by location |
X1 × X2 ranks high but X1 and X2 individually rank low, the interaction is capturing value.X1 × X2, the interaction adds predictive value.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119
import pandas as pdimport numpy as npfrom sklearn.ensemble import RandomForestRegressorfrom sklearn.inspection import PartialDependenceDisplayfrom itertools import combinations def detect_interactions_via_residuals( X: pd.DataFrame, y: pd.Series, base_features: list, candidate_pairs: list = None) -> pd.DataFrame: """ Detect interactions by checking if residuals correlate with interaction terms. Parameters: ----------- candidate_pairs: List of (col1, col2) tuples to test. If None, tests all pairs. """ from sklearn.linear_model import LinearRegression # Fit base model without interactions model = LinearRegression() model.fit(X[base_features], y) residuals = y - model.predict(X[base_features]) # Test candidate interactions if candidate_pairs is None: candidate_pairs = list(combinations(base_features, 2)) results = [] for col1, col2 in candidate_pairs: interaction = X[col1] * X[col2] correlation = np.corrcoef(residuals, interaction)[0, 1] results.append({ 'feature_1': col1, 'feature_2': col2, 'interaction_corr_with_residuals': abs(correlation), 'indicates_interaction': abs(correlation) > 0.1 }) return pd.DataFrame(results).sort_values( 'interaction_corr_with_residuals', ascending=False ) def friedman_h_statistic( model, X: pd.DataFrame, feature1: str, feature2: str, num_grid_points: int = 50) -> float: """ Compute Friedman's H-statistic for interaction strength. H = 0 means no interaction H closer to 1 means strong interaction """ from sklearn.inspection import partial_dependence # Get partial dependences pd_12 = partial_dependence( model, X, features=[feature1, feature2], grid_resolution=num_grid_points ) pd_1 = partial_dependence( model, X, features=[feature1], grid_resolution=num_grid_points ) pd_2 = partial_dependence( model, X, features=[feature2], grid_resolution=num_grid_points ) # Compute H-statistic # Variance of joint PD minus sum of individual PD variances joint_var = np.var(pd_12['average'][0]) sum_individual_var = np.var(pd_1['average'][0]) + np.var(pd_2['average'][0]) if sum_individual_var == 0: return 0 h_stat = (joint_var - sum_individual_var) / joint_var return max(0, h_stat) # Clamp to [0, 1] def tree_based_interaction_importance( X: pd.DataFrame, y: pd.Series, base_features: list, n_top_interactions: int = 10) -> pd.DataFrame: """ Use tree feature importance to identify valuable interactions. """ # Create all pairwise interactions X_with_ints = X[base_features].copy() interaction_cols = [] for col1, col2 in combinations(base_features, 2): int_col = f'{col1}_x_{col2}' X_with_ints[int_col] = X[col1] * X[col2] interaction_cols.append(int_col) # Fit random forest rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42) rf.fit(X_with_ints, y) # Get importance for interaction features only importance_df = pd.DataFrame({ 'feature': X_with_ints.columns, 'importance': rf.feature_importances_ }) # Filter to interactions and rank interactions = importance_df[importance_df['feature'].isin(interaction_cols)] return interactions.nlargest(n_top_interactions, 'importance')Testing many interactions and selecting the best is a form of multiple hypothesis testing. Use held-out validation sets to confirm discovered interactions generalize. Cross-validation with interaction selection inside each fold prevents optimistic estimates.
Beyond simple multiplication, richer interaction types capture more complex relationships.
Polynomial Features:
Expanding to degree-2 polynomials includes:
Degree-3 adds cubic terms and three-way interactions. The feature count grows rapidly: for n features and degree d, the count is $\binom{n+d}{d}$.
| Original Features | Degree | Feature Count |
|---|---|---|
| 10 | 2 | 66 |
| 10 | 3 | 286 |
| 50 | 2 | 1,326 |
| 50 | 3 | 23,426 |
| 100 | 2 | 5,151 |
X1 / X2 captures relative magnitude. Often more interpretable than products (e.g., price-to-earnings ratio, debt-to-income).X1 - X2 captures gap or change. Useful for before/after, prediction-actual, or competitive comparisons.min(X1, X2) or max(X1, X2) captures bottleneck or ceiling effects.|X1 - X2| captures disagreement magnitude regardless of direction.sqrt(X1 × X2) is less sensitive to outliers than arithmetic product.(X1 > threshold) AND (X2 > threshold) creates binary indicators for joint conditions.123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114
import pandas as pdimport numpy as np def create_advanced_interactions( df: pd.DataFrame, col1: str, col2: str) -> pd.DataFrame: """ Create a comprehensive set of interaction features between two numeric columns. """ int_df = pd.DataFrame(index=df.index) x1, x2 = df[col1], df[col2] # Basic multiplicative int_df[f'{col1}_x_{col2}'] = x1 * x2 # Ratios (with zero protection) eps = 1e-8 int_df[f'{col1}_div_{col2}'] = x1 / (x2 + eps) int_df[f'{col2}_div_{col1}'] = x2 / (x1 + eps) # Differences int_df[f'{col1}_minus_{col2}'] = x1 - x2 int_df[f'abs_diff_{col1}_{col2}'] = np.abs(x1 - x2) # Min/Max (bottleneck/ceiling) int_df[f'min_{col1}_{col2}'] = np.minimum(x1, x2) int_df[f'max_{col1}_{col2}'] = np.maximum(x1, x2) # Geometric and harmonic means int_df[f'geom_mean_{col1}_{col2}'] = np.sqrt(np.abs(x1 * x2)) * np.sign(x1 * x2) int_df[f'harm_mean_{col1}_{col2}'] = 2 * x1 * x2 / (x1 + x2 + eps) # Sum and average (sometimes useful for ensemble-like effects) int_df[f'sum_{col1}_{col2}'] = x1 + x2 int_df[f'avg_{col1}_{col2}'] = (x1 + x2) / 2 # Relative position (where is col1 relative to col2) int_df[f'{col1}_pct_of_{col2}'] = x1 / (x1 + x2 + eps) # Squared difference (emphasizes large gaps) int_df[f'sq_diff_{col1}_{col2}'] = (x1 - x2) ** 2 # Log of product (if positive) positive_mask = (x1 > 0) & (x2 > 0) int_df[f'log_product_{col1}_{col2}'] = np.where( positive_mask, np.log(x1 + eps) + np.log(x2 + eps), np.nan ) return int_df def create_threshold_interactions( df: pd.DataFrame, num_col: str, threshold_col: str, thresholds: list = None) -> pd.DataFrame: """ Create threshold-based binary interactions. """ int_df = pd.DataFrame(index=df.index) if thresholds is None: # Use quartiles as default thresholds thresholds = df[threshold_col].quantile([0.25, 0.5, 0.75]).tolist() for thresh in thresholds: thresh_name = f'{threshold_col}_gt_{thresh:.2f}'.replace('.', 'p') above_thresh = (df[threshold_col] > thresh).astype(int) # Numerical value when above threshold, 0 otherwise int_df[f'{num_col}_when_{thresh_name}'] = df[num_col] * above_thresh # Binary: both conditions met int_df[f'{num_col}_high_and_{thresh_name}'] = ( (df[num_col] > df[num_col].median()) & (df[threshold_col] > thresh) ).astype(int) return int_df # Example: Financial ratio interactionsdef financial_ratio_features(df: pd.DataFrame) -> pd.DataFrame: """ Domain-specific interaction features for financial data. """ ratios = pd.DataFrame(index=df.index) # Profitability ratios ratios['profit_margin'] = df['net_income'] / (df['revenue'] + 1) ratios['roa'] = df['net_income'] / (df['total_assets'] + 1) ratios['roe'] = df['net_income'] / (df['equity'] + 1) # Leverage ratios ratios['debt_to_equity'] = df['total_debt'] / (df['equity'] + 1) ratios['debt_to_assets'] = df['total_debt'] / (df['total_assets'] + 1) # Efficiency ratios ratios['asset_turnover'] = df['revenue'] / (df['total_assets'] + 1) ratios['inventory_turnover'] = df['cogs'] / (df['inventory'] + 1) # Liquidity ratios['current_ratio'] = df['current_assets'] / (df['current_liabilities'] + 1) ratios['quick_ratio'] = (df['current_assets'] - df['inventory']) / (df['current_liabilities'] + 1) # DuPont decomposition (ROE = margin × turnover × leverage) ratios['dupont_leverage'] = df['total_assets'] / (df['equity'] + 1) return ratiosWith 100 features, there are 4,950 pairwise interactions. Include three-way interactions and you have 161,700 features. This explosion causes problems:
Strategies for Taming the Explosion:
123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116
import pandas as pdimport numpy as npfrom sklearn.feature_selection import SelectKBest, f_regressionfrom sklearn.linear_model import LassoCVfrom itertools import combinations def importance_filtered_interactions( X: pd.DataFrame, y: pd.Series, base_features: list, importance_threshold: float = 0.01, max_interactions: int = 100) -> pd.DataFrame: """ Only create interactions between features that are individually important. """ from sklearn.ensemble import RandomForestRegressor # Get feature importance rf = RandomForestRegressor(n_estimators=50, max_depth=6, random_state=42) rf.fit(X[base_features], y) importance = pd.Series(rf.feature_importances_, index=base_features) important_features = importance[importance > importance_threshold].index.tolist() print(f"Kept {len(important_features)} of {len(base_features)} features for interactions") # Create interactions only among important features interactions = pd.DataFrame(index=X.index) for col1, col2 in combinations(important_features, 2): if len(interactions.columns) >= max_interactions: break interactions[f'{col1}_x_{col2}'] = X[col1] * X[col2] return interactions def lasso_selected_interactions( X: pd.DataFrame, y: pd.Series, base_features: list, alpha_range: tuple = (0.001, 1.0)) -> list: """ Use Lasso to automatically select useful interactions. Returns list of interaction feature names with non-zero coefficients. """ # Create all pairwise interactions X_full = X[base_features].copy() for col1, col2 in combinations(base_features, 2): X_full[f'{col1}_x_{col2}'] = X[col1] * X[col2] # Standardize for fair regularization X_scaled = (X_full - X_full.mean()) / (X_full.std() + 1e-8) # Cross-validated Lasso lasso = LassoCV(cv=5, random_state=42) lasso.fit(X_scaled, y) # Get selected features (non-zero coefficients) selected = X_full.columns[lasso.coef_ != 0].tolist() # Filter to only interaction features interaction_features = [f for f in selected if '_x_' in f] print(f"Lasso selected {len(interaction_features)} interactions") print(f"Best alpha: {lasso.alpha_:.4f}") return interaction_features def staged_interaction_addition( X: pd.DataFrame, y: pd.Series, base_features: list, candidate_interactions: list, validation_metric: callable, max_to_add: int = 20) -> list: """ Greedily add interactions one at a time if they improve validation score. """ from sklearn.model_selection import cross_val_score from sklearn.ensemble import GradientBoostingRegressor current_features = base_features.copy() added_interactions = [] base_model = GradientBoostingRegressor(n_estimators=50, max_depth=3, random_state=42) best_score = cross_val_score(base_model, X[current_features], y, cv=3).mean() print(f"Baseline score: {best_score:.4f}") for interaction in candidate_interactions: if len(added_interactions) >= max_to_add: break col1, col2 = interaction int_name = f'{col1}_x_{col2}' X[int_name] = X[col1] * X[col2] trial_features = current_features + [int_name] score = cross_val_score(base_model, X[trial_features], y, cv=3).mean() if score > best_score + 0.001: # Require meaningful improvement current_features.append(int_name) added_interactions.append(int_name) best_score = score print(f"Added {int_name}, score: {score:.4f}") else: X.drop(columns=[int_name], inplace=True) print(f"Final score: {best_score:.4f} with {len(added_interactions)} interactions") return added_interactionsFactorization Machines (FM) model all pairwise interactions without explicitly creating them. Each feature gets a latent vector, and interactions are computed as dot products of these vectors. This makes FM memory-efficient and able to generalize to unseen feature combinations. Libraries like libFM, xLearn, and PyTorch have FM implementations.
Let's see how interaction engineering works in a realistic scenario.
Case Study: Click-Through Rate Prediction
Predicting ad clicks involves user features, ad features, and contextual features. Key interactions:
| Interaction | Rationale | Expected Effect |
|---|---|---|
user_age × ad_category | Age groups respond differently to product categories | Fashion ads click better with younger users |
time_of_day × device_type | Mobile usage peaks at commute times, desktop at work hours | Captures device-time-specific engagement patterns |
user_ctr_history × ad_position | Engaged users click even in lower positions | Position sensitivity varies by user type |
ad_price × user_income_proxy | Price-sensitive users respond differently to premium products | Match ad pricing to user willingness-to-pay |
query_ad_similarity × ad_freshness | Relevance matters more for new, untested ads | Fresh ads need higher relevance to earn clicks |
Production Considerations:
Feature computation latency: Interactions must compute in real-time for serving. Pre-compute where possible.
Feature distribution shift: Monitor interaction feature distributions over time. User behavior changes can break learned interaction patterns.
Interpretability burden: Each interaction adds complexity. Document why each interaction was added and its expected direction.
Numerical stability: Division-based interactions need zero/null handling. Multiplication can overflow with large values.
Cardinality with categoricals: Categorical interactions can explode cardinality. Hash or limit to top-k combinations.
Interaction features unlock predictive power that individual features cannot express. They're how we encode 'it depends'—when one feature's effect depends on another's value. Here are the key insights:
You now understand how to design, detect, and manage interaction features. These features often provide the lift that separates good models from great ones—especially when domain knowledge guides their construction. Next, we'll explore time-based features, where temporal patterns and event sequences add another dimension to feature engineering.