Machine LearningFeature Engineering & Selection

Feature Engineering Mastery

LevelIntermediate

Duration90 mins

TopicFeature Engineering & Selection

3 / 5

Interaction Features

When 1 + 1 = 10

Consider predicting house prices. Square footage matters. Number of bedrooms matters. But neither alone captures price per square foot varies by bedroom count—a 2000 sq ft house with 2 bedrooms is valued differently than a 2000 sq ft house with 5 bedrooms. The interaction between these features creates new information.

Or consider fraud detection: transaction amount alone isn't suspicious. Night-time transactions alone aren't suspicious. But a large transaction at 3 AM from an account that never makes night purchases? That combination is a red flag that neither feature signals independently.

This is the power of interaction features: they capture relationships where the effect of one variable depends on the value of another. Without them, linear models are blind to these patterns, and even tree-based models may struggle to discover them efficiently.

What You Will Learn

This page covers the theory and practice of interaction features. You'll learn why interactions matter mathematically, how to identify candidate interactions, implementation strategies for different feature types, and techniques for managing the combinatorial explosion when features multiply.

Why Interactions Matter

The Mathematical Foundation:

Linear models express predictions as:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$$

This assumes additivity: the effect of increasing $x_1$ is the same regardless of $x_2$'s value. Adding an interaction term breaks this assumption:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2$$

Now the effect of $x_1$ depends on $x_2$: $\frac{\partial \hat{y}}{\partial x_1} = \beta_1 + \beta_{12} x_2$

This captures:

Synergistic effects: Combining features produces more than the sum of parts
Dampening effects: One feature reduces another's impact
Conditional relationships: Effect only exists when condition is met

Types of Interaction Effects
Interaction Type	Description	Example
Synergistic	Combined effect exceeds sum of individual effects	Exercise + Diet together reduce weight more than each alone
Antagonistic	Combined effect is less than sum (diminishing returns)	Advertising + Price discount—both attract buyers, but overlap exists
Threshold	Effect only appears when both conditions met	Education level only matters for high-income job applications
Modifying	One feature changes the direction/magnitude of another	Age modifies the effect of exercise on heart health
Necessary	Effect requires presence of both features	Key + Lock: neither works alone, together they open doors

Model Capabilities Differ

Linear models CANNOT learn interactions without explicit features. Tree-based models CAN learn interactions through sequential splitting (first split on X₁, then on X₂), but may need many splits to approximate smooth interaction surfaces. Neural networks can learn interactions in hidden layers. Explicitly creating interaction features often helps all model types—trees include them more easily, and neural networks converge faster.

Types of Interaction Features

Interaction features come in several forms depending on the types of features being combined.

Numerical × Numerical Interactions:

The simplest case—multiply two numerical features:

height × weight → body mass proxy
price × quantity → revenue
years_experience × education_level → human capital proxy
distance × time (inverse) → speed

Categorical × Categorical Interactions:

Create new category from combination:

{gender} × {age_group} → {male_25-34, female_35-44, ...}
{product_category} × {day_of_week} → captures category-specific weekly patterns
{city} × {weather} → location-weather combinations

Numerical × Categorical Interactions:

Create category-specific versions of numerical features:

income × gender → separate income effects by gender
age × product_type → age effects vary by product
price_sensitivity × customer_segment → segment-specific price response

interaction_features.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from itertools import combinations
 
def create_numerical_interactions(
    df: pd.DataFrame,
    numeric_cols: list,
    interaction_type: str = 'multiplicative'
) -> pd.DataFrame:
    """
    Create interaction features between numerical columns.
    
    Parameters:
    -----------
    interaction_type: 'multiplicative', 'ratio', 'difference', or 'all'
    """
    interactions = pd.DataFrame(index=df.index)
    
    for col1, col2 in combinations(numeric_cols, 2):
        if interaction_type in ['multiplicative', 'all']:
            interactions[f'{col1}_x_{col2}'] = df[col1] * df[col2]
        
        if interaction_type in ['ratio', 'all']:
            # Avoid division by zero
            interactions[f'{col1}_div_{col2}'] = df[col1] / (df[col2] + 1e-8)
            interactions[f'{col2}_div_{col1}'] = df[col2] / (df[col1] + 1e-8)
        
        if interaction_type in ['difference', 'all']:
            interactions[f'{col1}_minus_{col2}'] = df[col1] - df[col2]
    
    return interactions
 
 
def create_categorical_interactions(
    df: pd.DataFrame,
    cat_cols: list,
    max_cardinality: int = 100
) -> pd.DataFrame:
    """
    Create interaction features between categorical columns.
    Limits output cardinality to prevent explosion.
    """
    interactions = pd.DataFrame(index=df.index)
    
    for col1, col2 in combinations(cat_cols, 2):
        combined = df[col1].astype(str) + '_' + df[col2].astype(str)
        
        # Check cardinality
        if combined.nunique() <= max_cardinality:
            interactions[f'{col1}_x_{col2}'] = combined
        else:
            # Keep only most frequent combinations
            top_values = combined.value_counts().head(max_cardinality).index
            interactions[f'{col1}_x_{col2}'] = combined.where(
                combined.isin(top_values), 'other'
            )
    
    return interactions
 
 
def create_num_cat_interactions(
    df: pd.DataFrame,
    num_cols: list,
    cat_cols: list
) -> pd.DataFrame:
    """
    Create numerical × categorical interactions.
    For each category, creates a version of the numerical feature.
    """
    interactions = pd.DataFrame(index=df.index)
    
    for num_col in num_cols:
        for cat_col in cat_cols:
            # One-hot style: separate column per category
            for category in df[cat_col].unique():
                mask = df[cat_col] == category
                col_name = f'{num_col}_when_{cat_col}_{category}'
                interactions[col_name] = df[num_col].where(mask, 0)
    
    return interactions
 
 
# Using sklearn's PolynomialFeatures for exhaustive interactions
def polynomial_interactions(
    df: pd.DataFrame,
    cols: list,
    degree: int = 2,
    include_bias: bool = False,
    interaction_only: bool = True
) -> pd.DataFrame:
    """
    Create polynomial interaction features using sklearn.
    
    interaction_only=True excludes squared terms (x1², x2²)
    """
    poly = PolynomialFeatures(
        degree=degree,
        include_bias=include_bias,
        interaction_only=interaction_only
    )
    
    X = df[cols].values
    X_poly = poly.fit_transform(X)
    
    feature_names = poly.get_feature_names_out(cols)
    
    return pd.DataFrame(X_poly, index=df.index, columns=feature_names)
 
 
# Example usage
df = pd.DataFrame({
    'height': [170, 165, 180, 175, 160],
    'weight': [70, 55, 85, 80, 50],
    'age': [25, 30, 35, 40, 45],
    'gender': ['M', 'F', 'M', 'M', 'F'],
    'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC']
})
 
# Create all types of interactions
num_ints = create_numerical_interactions(df, ['height', 'weight', 'age'], 'all')
cat_ints = create_categorical_interactions(df, ['gender', 'city'])
mixed_ints = create_num_cat_interactions(df, ['height', 'weight'], ['gender'])
 
print("Numerical Interactions:")
print(num_ints.head())

Identifying Useful Interactions

With n features, there are n(n-1)/2 pairwise interactions, and the number explodes for higher-order combinations. Most interactions are noise. How do you identify the valuable ones?

Domain-Guided Selection:

The most reliable approach: use domain knowledge to hypothesize interactions.

Domain	Interaction Hypothesis	Rationale
E-commerce	`price × brand_tier`	Premium brands may be less price-sensitive
Credit	`income × debt`	High income with high debt is different than high income alone
Healthcare	`age × medication_count`	Polypharmacy effects increase with age
Marketing	`channel × time_of_day`	Email works differently than SMS by time
Real estate	`sqft × neighborhood`	Price per sqft varies dramatically by location

Statistical Interaction Detection

•Tree-Based Feature Importance — Train a tree model, then examine feature importance for engineered interactions. If X1 × X2 ranks high but X1 and X2 individually rank low, the interaction is capturing value.
•Partial Dependence Interaction Detection — Friedman's H-statistic measures interaction strength by comparing joint partial dependence to individual partial dependences.
•ANOVA Interaction Tests — For categorical features, two-way ANOVA formally tests whether group means differ in a non-additive pattern.
•Correlation of Residuals — Fit model without interactions, compute residuals. If residuals correlate with X1 × X2, the interaction adds predictive value.
•Recursive Feature Elimination with Interactions — Include all candidate interactions, then use RFE to identify which survive elimination.

interaction_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import PartialDependenceDisplay
from itertools import combinations
 
def detect_interactions_via_residuals(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    candidate_pairs: list = None
) -> pd.DataFrame:
    """
    Detect interactions by checking if residuals correlate with interaction terms.
    
    Parameters:
    -----------
    candidate_pairs: List of (col1, col2) tuples to test. If None, tests all pairs.
    """
    from sklearn.linear_model import LinearRegression
    
    # Fit base model without interactions
    model = LinearRegression()
    model.fit(X[base_features], y)
    residuals = y - model.predict(X[base_features])
    
    # Test candidate interactions
    if candidate_pairs is None:
        candidate_pairs = list(combinations(base_features, 2))
    
    results = []
    for col1, col2 in candidate_pairs:
        interaction = X[col1] * X[col2]
        correlation = np.corrcoef(residuals, interaction)[0, 1]
        
        results.append({
            'feature_1': col1,
            'feature_2': col2,
            'interaction_corr_with_residuals': abs(correlation),
            'indicates_interaction': abs(correlation) > 0.1
        })
    
    return pd.DataFrame(results).sort_values(
        'interaction_corr_with_residuals', ascending=False
    )
 
 
def friedman_h_statistic(
    model,
    X: pd.DataFrame,
    feature1: str,
    feature2: str,
    num_grid_points: int = 50
) -> float:
    """
    Compute Friedman's H-statistic for interaction strength.
    
    H = 0 means no interaction
    H closer to 1 means strong interaction
    """
    from sklearn.inspection import partial_dependence
    
    # Get partial dependences
    pd_12 = partial_dependence(
        model, X, features=[feature1, feature2],
        grid_resolution=num_grid_points
    )
    pd_1 = partial_dependence(
        model, X, features=[feature1],
        grid_resolution=num_grid_points
    )
    pd_2 = partial_dependence(
        model, X, features=[feature2],
        grid_resolution=num_grid_points
    )
    
    # Compute H-statistic
    # Variance of joint PD minus sum of individual PD variances
    joint_var = np.var(pd_12['average'][0])
    sum_individual_var = np.var(pd_1['average'][0]) + np.var(pd_2['average'][0])
    
    if sum_individual_var == 0:
        return 0
    
    h_stat = (joint_var - sum_individual_var) / joint_var
    return max(0, h_stat)  # Clamp to [0, 1]
 
 
def tree_based_interaction_importance(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    n_top_interactions: int = 10
) -> pd.DataFrame:
    """
    Use tree feature importance to identify valuable interactions.
    """
    # Create all pairwise interactions
    X_with_ints = X[base_features].copy()
    interaction_cols = []
    
    for col1, col2 in combinations(base_features, 2):
        int_col = f'{col1}_x_{col2}'
        X_with_ints[int_col] = X[col1] * X[col2]
        interaction_cols.append(int_col)
    
    # Fit random forest
    rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
    rf.fit(X_with_ints, y)
    
    # Get importance for interaction features only
    importance_df = pd.DataFrame({
        'feature': X_with_ints.columns,
        'importance': rf.feature_importances_
    })
    
    # Filter to interactions and rank
    interactions = importance_df[importance_df['feature'].isin(interaction_cols)]
    return interactions.nlargest(n_top_interactions, 'importance')

Interaction Overfitting Risk

Testing many interactions and selecting the best is a form of multiple hypothesis testing. Use held-out validation sets to confirm discovered interactions generalize. Cross-validation with interaction selection inside each fold prevents optimistic estimates.

Polynomial and Non-Linear Interactions

Beyond simple multiplication, richer interaction types capture more complex relationships.

Polynomial Features:

Expanding to degree-2 polynomials includes:

Original features: $x_1, x_2, x_3$
Squared terms: $x_1^2, x_2^2, x_3^2$
Interactions: $x_1 x_2, x_1 x_3, x_2 x_3$

Degree-3 adds cubic terms and three-way interactions. The feature count grows rapidly: for n features and degree d, the count is $\binom{n+d}{d}$.

Original Features	Degree	Feature Count
10	2	66
10	3	286
50	2	1,326
50	3	23,426
100	2	5,151

Non-Multiplicative Interaction Types

•Ratios — X1 / X2 captures relative magnitude. Often more interpretable than products (e.g., price-to-earnings ratio, debt-to-income).
•Differences — X1 - X2 captures gap or change. Useful for before/after, prediction-actual, or competitive comparisons.
•Min/Max — min(X1, X2) or max(X1, X2) captures bottleneck or ceiling effects.
•Absolute Difference — |X1 - X2| captures disagreement magnitude regardless of direction.
•Geometric Mean — sqrt(X1 × X2) is less sensitive to outliers than arithmetic product.
•Boolean Logic — (X1 > threshold) AND (X2 > threshold) creates binary indicators for joint conditions.

advanced_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import pandas as pd
import numpy as np
 
def create_advanced_interactions(
    df: pd.DataFrame,
    col1: str,
    col2: str
) -> pd.DataFrame:
    """
    Create a comprehensive set of interaction features between two numeric columns.
    """
    int_df = pd.DataFrame(index=df.index)
    
    x1, x2 = df[col1], df[col2]
    
    # Basic multiplicative
    int_df[f'{col1}_x_{col2}'] = x1 * x2
    
    # Ratios (with zero protection)
    eps = 1e-8
    int_df[f'{col1}_div_{col2}'] = x1 / (x2 + eps)
    int_df[f'{col2}_div_{col1}'] = x2 / (x1 + eps)
    
    # Differences
    int_df[f'{col1}_minus_{col2}'] = x1 - x2
    int_df[f'abs_diff_{col1}_{col2}'] = np.abs(x1 - x2)
    
    # Min/Max (bottleneck/ceiling)
    int_df[f'min_{col1}_{col2}'] = np.minimum(x1, x2)
    int_df[f'max_{col1}_{col2}'] = np.maximum(x1, x2)
    
    # Geometric and harmonic means
    int_df[f'geom_mean_{col1}_{col2}'] = np.sqrt(np.abs(x1 * x2)) * np.sign(x1 * x2)
    int_df[f'harm_mean_{col1}_{col2}'] = 2 * x1 * x2 / (x1 + x2 + eps)
    
    # Sum and average (sometimes useful for ensemble-like effects)
    int_df[f'sum_{col1}_{col2}'] = x1 + x2
    int_df[f'avg_{col1}_{col2}'] = (x1 + x2) / 2
    
    # Relative position (where is col1 relative to col2)
    int_df[f'{col1}_pct_of_{col2}'] = x1 / (x1 + x2 + eps)
    
    # Squared difference (emphasizes large gaps)
    int_df[f'sq_diff_{col1}_{col2}'] = (x1 - x2) ** 2
    
    # Log of product (if positive)
    positive_mask = (x1 > 0) & (x2 > 0)
    int_df[f'log_product_{col1}_{col2}'] = np.where(
        positive_mask,
        np.log(x1 + eps) + np.log(x2 + eps),
        np.nan
    )
    
    return int_df
 
 
def create_threshold_interactions(
    df: pd.DataFrame,
    num_col: str,
    threshold_col: str,
    thresholds: list = None
) -> pd.DataFrame:
    """
    Create threshold-based binary interactions.
    """
    int_df = pd.DataFrame(index=df.index)
    
    if thresholds is None:
        # Use quartiles as default thresholds
        thresholds = df[threshold_col].quantile([0.25, 0.5, 0.75]).tolist()
    
    for thresh in thresholds:
        thresh_name = f'{threshold_col}_gt_{thresh:.2f}'.replace('.', 'p')
        above_thresh = (df[threshold_col] > thresh).astype(int)
        
        # Numerical value when above threshold, 0 otherwise
        int_df[f'{num_col}_when_{thresh_name}'] = df[num_col] * above_thresh
        
        # Binary: both conditions met
        int_df[f'{num_col}_high_and_{thresh_name}'] = (
            (df[num_col] > df[num_col].median()) & (df[threshold_col] > thresh)
        ).astype(int)
    
    return int_df
 
 
# Example: Financial ratio interactions
def financial_ratio_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Domain-specific interaction features for financial data.
    """
    ratios = pd.DataFrame(index=df.index)
    
    # Profitability ratios
    ratios['profit_margin'] = df['net_income'] / (df['revenue'] + 1)
    ratios['roa'] = df['net_income'] / (df['total_assets'] + 1)
    ratios['roe'] = df['net_income'] / (df['equity'] + 1)
    
    # Leverage ratios
    ratios['debt_to_equity'] = df['total_debt'] / (df['equity'] + 1)
    ratios['debt_to_assets'] = df['total_debt'] / (df['total_assets'] + 1)
    
    # Efficiency ratios
    ratios['asset_turnover'] = df['revenue'] / (df['total_assets'] + 1)
    ratios['inventory_turnover'] = df['cogs'] / (df['inventory'] + 1)
    
    # Liquidity
    ratios['current_ratio'] = df['current_assets'] / (df['current_liabilities'] + 1)
    ratios['quick_ratio'] = (df['current_assets'] - df['inventory']) / (df['current_liabilities'] + 1)
    
    # DuPont decomposition (ROE = margin × turnover × leverage)
    ratios['dupont_leverage'] = df['total_assets'] / (df['equity'] + 1)
    
    return ratios

Managing the Combinatorial Explosion

With 100 features, there are 4,950 pairwise interactions. Include three-way interactions and you have 161,700 features. This explosion causes problems:

Memory constraints: Can't fit in RAM
Overfitting: More features than samples
Noise: Most interactions are uninformative
Multicollinearity: Interactions correlate with base features

Strategies for Taming the Explosion:

Interaction Management Strategies

•Domain-Guided Selection — Only create interactions with domain justification. 10 thoughtful interactions beat 10,000 random ones.
•Univariate Pre-Filtering — Only interact features that are individually predictive. Low-importance features rarely create important interactions.
•Staged Addition — Start with no interactions, then iteratively add the most promising ones based on validation performance.
•Regularization — Use L1 (Lasso) or Elastic Net to automatically shrink uninformative interactions to zero.
•Feature Hashing — Hash interaction identifiers to fixed-size vector. Trades exactness for memory efficiency.
•Factorization Machines — Learn interaction weights implicitly through latent factor multiplication, avoiding explicit feature creation.
•Interaction Networks (Neural) — Let neural network layers discover useful combinations implicitly.

managing_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.linear_model import LassoCV
from itertools import combinations
 
def importance_filtered_interactions(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    importance_threshold: float = 0.01,
    max_interactions: int = 100
) -> pd.DataFrame:
    """
    Only create interactions between features that are individually important.
    """
    from sklearn.ensemble import RandomForestRegressor
    
    # Get feature importance
    rf = RandomForestRegressor(n_estimators=50, max_depth=6, random_state=42)
    rf.fit(X[base_features], y)
    
    importance = pd.Series(rf.feature_importances_, index=base_features)
    important_features = importance[importance > importance_threshold].index.tolist()
    
    print(f"Kept {len(important_features)} of {len(base_features)} features for interactions")
    
    # Create interactions only among important features
    interactions = pd.DataFrame(index=X.index)
    
    for col1, col2 in combinations(important_features, 2):
        if len(interactions.columns) >= max_interactions:
            break
        interactions[f'{col1}_x_{col2}'] = X[col1] * X[col2]
    
    return interactions
 
 
def lasso_selected_interactions(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    alpha_range: tuple = (0.001, 1.0)
) -> list:
    """
    Use Lasso to automatically select useful interactions.
    Returns list of interaction feature names with non-zero coefficients.
    """
    # Create all pairwise interactions
    X_full = X[base_features].copy()
    
    for col1, col2 in combinations(base_features, 2):
        X_full[f'{col1}_x_{col2}'] = X[col1] * X[col2]
    
    # Standardize for fair regularization
    X_scaled = (X_full - X_full.mean()) / (X_full.std() + 1e-8)
    
    # Cross-validated Lasso
    lasso = LassoCV(cv=5, random_state=42)
    lasso.fit(X_scaled, y)
    
    # Get selected features (non-zero coefficients)
    selected = X_full.columns[lasso.coef_ != 0].tolist()
    
    # Filter to only interaction features
    interaction_features = [f for f in selected if '_x_' in f]
    
    print(f"Lasso selected {len(interaction_features)} interactions")
    print(f"Best alpha: {lasso.alpha_:.4f}")
    
    return interaction_features
 
 
def staged_interaction_addition(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    candidate_interactions: list,
    validation_metric: callable,
    max_to_add: int = 20
) -> list:
    """
    Greedily add interactions one at a time if they improve validation score.
    """
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import GradientBoostingRegressor
    
    current_features = base_features.copy()
    added_interactions = []
    
    base_model = GradientBoostingRegressor(n_estimators=50, max_depth=3, random_state=42)
    best_score = cross_val_score(base_model, X[current_features], y, cv=3).mean()
    
    print(f"Baseline score: {best_score:.4f}")
    
    for interaction in candidate_interactions:
        if len(added_interactions) >= max_to_add:
            break
        
        col1, col2 = interaction
        int_name = f'{col1}_x_{col2}'
        X[int_name] = X[col1] * X[col2]
        
        trial_features = current_features + [int_name]
        score = cross_val_score(base_model, X[trial_features], y, cv=3).mean()
        
        if score > best_score + 0.001:  # Require meaningful improvement
            current_features.append(int_name)
            added_interactions.append(int_name)
            best_score = score
            print(f"Added {int_name}, score: {score:.4f}")
        else:
            X.drop(columns=[int_name], inplace=True)
    
    print(f"Final score: {best_score:.4f} with {len(added_interactions)} interactions")
    return added_interactions

The Factorization Machine Approach

Factorization Machines (FM) model all pairwise interactions without explicitly creating them. Each feature gets a latent vector, and interactions are computed as dot products of these vectors. This makes FM memory-efficient and able to generalize to unseen feature combinations. Libraries like libFM, xLearn, and PyTorch have FM implementations.

Interaction Features in Practice

Let's see how interaction engineering works in a realistic scenario.

Case Study: Click-Through Rate Prediction

Predicting ad clicks involves user features, ad features, and contextual features. Key interactions:

CTR Interaction Features
Interaction	Rationale	Expected Effect
`user_age × ad_category`	Age groups respond differently to product categories	Fashion ads click better with younger users
`time_of_day × device_type`	Mobile usage peaks at commute times, desktop at work hours	Captures device-time-specific engagement patterns
`user_ctr_history × ad_position`	Engaged users click even in lower positions	Position sensitivity varies by user type
`ad_price × user_income_proxy`	Price-sensitive users respond differently to premium products	Match ad pricing to user willingness-to-pay
`query_ad_similarity × ad_freshness`	Relevance matters more for new, untested ads	Fresh ads need higher relevance to earn clicks

Production Considerations:

Feature computation latency: Interactions must compute in real-time for serving. Pre-compute where possible.
Feature distribution shift: Monitor interaction feature distributions over time. User behavior changes can break learned interaction patterns.
Interpretability burden: Each interaction adds complexity. Document why each interaction was added and its expected direction.
Numerical stability: Division-based interactions need zero/null handling. Multiplication can overflow with large values.
Cardinality with categoricals: Categorical interactions can explode cardinality. Hash or limit to top-k combinations.

Summary: Capturing Non-Additive Relationships

Interaction features unlock predictive power that individual features cannot express. They're how we encode 'it depends'—when one feature's effect depends on another's value. Here are the key insights:

Key Takeaways

•Interactions capture conditional relationships where one feature modifies another's effect—essential for non-additive phenomena.
•Linear models require explicit interactions; trees can learn them but may need many splits. Explicit features help all model types.
•Interaction types include multiplicative, ratios, differences, min/max, and boolean combinations—each captures different relationship shapes.
•Domain knowledge guides interaction selection more reliably than exhaustive search. Hypothesize, then validate.
•Statistical detection methods include residual correlation, Friedman's H-statistic, and tree-based importance ranking.
•Managing explosion requires filtering, regularization, staged addition, or implicit methods like Factorization Machines.
•Production systems need real-time computation, distribution monitoring, and clear documentation for each interaction.

Page Complete

You now understand how to design, detect, and manage interaction features. These features often provide the lift that separates good models from great ones—especially when domain knowledge guides their construction. Next, we'll explore time-based features, where temporal patterns and event sequences add another dimension to feature engineering.

3 / 5

Loading learning content...

Machine LearningFeature Engineering & Selection

Feature Engineering Mastery

LevelIntermediate

Duration90 mins

TopicFeature Engineering & Selection

3 / 5

Interaction Features

When 1 + 1 = 10

What You Will Learn

Why Interactions Matter

The Mathematical Foundation:

Linear models express predictions as:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + ... + \beta_n x_n$$

This assumes additivity: the effect of increasing $x_1$ is the same regardless of $x_2$'s value. Adding an interaction term breaks this assumption:

$$\hat{y} = \beta_0 + \beta_1 x_1 + \beta_2 x_2 + \beta_{12} x_1 x_2$$

Now the effect of $x_1$ depends on $x_2$: $\frac{\partial \hat{y}}{\partial x_1} = \beta_1 + \beta_{12} x_2$

This captures:

Synergistic effects: Combining features produces more than the sum of parts
Dampening effects: One feature reduces another's impact
Conditional relationships: Effect only exists when condition is met

Types of Interaction Effects
Interaction Type	Description	Example
Synergistic	Combined effect exceeds sum of individual effects	Exercise + Diet together reduce weight more than each alone
Antagonistic	Combined effect is less than sum (diminishing returns)	Advertising + Price discount—both attract buyers, but overlap exists
Threshold	Effect only appears when both conditions met	Education level only matters for high-income job applications
Modifying	One feature changes the direction/magnitude of another	Age modifies the effect of exercise on heart health
Necessary	Effect requires presence of both features	Key + Lock: neither works alone, together they open doors

Model Capabilities Differ

Types of Interaction Features

Interaction features come in several forms depending on the types of features being combined.

Numerical × Numerical Interactions:

The simplest case—multiply two numerical features:

height × weight → body mass proxy
price × quantity → revenue
years_experience × education_level → human capital proxy
distance × time (inverse) → speed

Categorical × Categorical Interactions:

Create new category from combination:

{gender} × {age_group} → {male_25-34, female_35-44, ...}
{product_category} × {day_of_week} → captures category-specific weekly patterns
{city} × {weather} → location-weather combinations

Numerical × Categorical Interactions:

Create category-specific versions of numerical features:

income × gender → separate income effects by gender
age × product_type → age effects vary by product
price_sensitivity × customer_segment → segment-specific price response

interaction_features.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
import pandas as pd
import numpy as np
from sklearn.preprocessing import PolynomialFeatures
from itertools import combinations
 
def create_numerical_interactions(
    df: pd.DataFrame,
    numeric_cols: list,
    interaction_type: str = 'multiplicative'
) -> pd.DataFrame:
    """
    Create interaction features between numerical columns.
    
    Parameters:
    -----------
    interaction_type: 'multiplicative', 'ratio', 'difference', or 'all'
    """
    interactions = pd.DataFrame(index=df.index)
    
    for col1, col2 in combinations(numeric_cols, 2):
        if interaction_type in ['multiplicative', 'all']:
            interactions[f'{col1}_x_{col2}'] = df[col1] * df[col2]
        
        if interaction_type in ['ratio', 'all']:
            # Avoid division by zero
            interactions[f'{col1}_div_{col2}'] = df[col1] / (df[col2] + 1e-8)
            interactions[f'{col2}_div_{col1}'] = df[col2] / (df[col1] + 1e-8)
        
        if interaction_type in ['difference', 'all']:
            interactions[f'{col1}_minus_{col2}'] = df[col1] - df[col2]
    
    return interactions
 
 
def create_categorical_interactions(
    df: pd.DataFrame,
    cat_cols: list,
    max_cardinality: int = 100
) -> pd.DataFrame:
    """
    Create interaction features between categorical columns.
    Limits output cardinality to prevent explosion.
    """
    interactions = pd.DataFrame(index=df.index)
    
    for col1, col2 in combinations(cat_cols, 2):
        combined = df[col1].astype(str) + '_' + df[col2].astype(str)
        
        # Check cardinality
        if combined.nunique() <= max_cardinality:
            interactions[f'{col1}_x_{col2}'] = combined
        else:
            # Keep only most frequent combinations
            top_values = combined.value_counts().head(max_cardinality).index
            interactions[f'{col1}_x_{col2}'] = combined.where(
                combined.isin(top_values), 'other'
            )
    
    return interactions
 
 
def create_num_cat_interactions(
    df: pd.DataFrame,
    num_cols: list,
    cat_cols: list
) -> pd.DataFrame:
    """
    Create numerical × categorical interactions.
    For each category, creates a version of the numerical feature.
    """
    interactions = pd.DataFrame(index=df.index)
    
    for num_col in num_cols:
        for cat_col in cat_cols:
            # One-hot style: separate column per category
            for category in df[cat_col].unique():
                mask = df[cat_col] == category
                col_name = f'{num_col}_when_{cat_col}_{category}'
                interactions[col_name] = df[num_col].where(mask, 0)
    
    return interactions
 
 
# Using sklearn's PolynomialFeatures for exhaustive interactions
def polynomial_interactions(
    df: pd.DataFrame,
    cols: list,
    degree: int = 2,
    include_bias: bool = False,
    interaction_only: bool = True
) -> pd.DataFrame:
    """
    Create polynomial interaction features using sklearn.
    
    interaction_only=True excludes squared terms (x1², x2²)
    """
    poly = PolynomialFeatures(
        degree=degree,
        include_bias=include_bias,
        interaction_only=interaction_only
    )
    
    X = df[cols].values
    X_poly = poly.fit_transform(X)
    
    feature_names = poly.get_feature_names_out(cols)
    
    return pd.DataFrame(X_poly, index=df.index, columns=feature_names)
 
 
# Example usage
df = pd.DataFrame({
    'height': [170, 165, 180, 175, 160],
    'weight': [70, 55, 85, 80, 50],
    'age': [25, 30, 35, 40, 45],
    'gender': ['M', 'F', 'M', 'M', 'F'],
    'city': ['NYC', 'LA', 'NYC', 'LA', 'NYC']
})
 
# Create all types of interactions
num_ints = create_numerical_interactions(df, ['height', 'weight', 'age'], 'all')
cat_ints = create_categorical_interactions(df, ['gender', 'city'])
mixed_ints = create_num_cat_interactions(df, ['height', 'weight'], ['gender'])
 
print("Numerical Interactions:")
print(num_ints.head())

Identifying Useful Interactions

With n features, there are n(n-1)/2 pairwise interactions, and the number explodes for higher-order combinations. Most interactions are noise. How do you identify the valuable ones?

Domain-Guided Selection:

The most reliable approach: use domain knowledge to hypothesize interactions.

Domain	Interaction Hypothesis	Rationale
E-commerce	`price × brand_tier`	Premium brands may be less price-sensitive
Credit	`income × debt`	High income with high debt is different than high income alone
Healthcare	`age × medication_count`	Polypharmacy effects increase with age
Marketing	`channel × time_of_day`	Email works differently than SMS by time
Real estate	`sqft × neighborhood`	Price per sqft varies dramatically by location

Statistical Interaction Detection

•Tree-Based Feature Importance — Train a tree model, then examine feature importance for engineered interactions. If X1 × X2 ranks high but X1 and X2 individually rank low, the interaction is capturing value.
•Partial Dependence Interaction Detection — Friedman's H-statistic measures interaction strength by comparing joint partial dependence to individual partial dependences.
•ANOVA Interaction Tests — For categorical features, two-way ANOVA formally tests whether group means differ in a non-additive pattern.
•Correlation of Residuals — Fit model without interactions, compute residuals. If residuals correlate with X1 × X2, the interaction adds predictive value.
•Recursive Feature Elimination with Interactions — Include all candidate interactions, then use RFE to identify which survive elimination.

interaction_detection.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
import pandas as pd
import numpy as np
from sklearn.ensemble import RandomForestRegressor
from sklearn.inspection import PartialDependenceDisplay
from itertools import combinations
 
def detect_interactions_via_residuals(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    candidate_pairs: list = None
) -> pd.DataFrame:
    """
    Detect interactions by checking if residuals correlate with interaction terms.
    
    Parameters:
    -----------
    candidate_pairs: List of (col1, col2) tuples to test. If None, tests all pairs.
    """
    from sklearn.linear_model import LinearRegression
    
    # Fit base model without interactions
    model = LinearRegression()
    model.fit(X[base_features], y)
    residuals = y - model.predict(X[base_features])
    
    # Test candidate interactions
    if candidate_pairs is None:
        candidate_pairs = list(combinations(base_features, 2))
    
    results = []
    for col1, col2 in candidate_pairs:
        interaction = X[col1] * X[col2]
        correlation = np.corrcoef(residuals, interaction)[0, 1]
        
        results.append({
            'feature_1': col1,
            'feature_2': col2,
            'interaction_corr_with_residuals': abs(correlation),
            'indicates_interaction': abs(correlation) > 0.1
        })
    
    return pd.DataFrame(results).sort_values(
        'interaction_corr_with_residuals', ascending=False
    )
 
 
def friedman_h_statistic(
    model,
    X: pd.DataFrame,
    feature1: str,
    feature2: str,
    num_grid_points: int = 50
) -> float:
    """
    Compute Friedman's H-statistic for interaction strength.
    
    H = 0 means no interaction
    H closer to 1 means strong interaction
    """
    from sklearn.inspection import partial_dependence
    
    # Get partial dependences
    pd_12 = partial_dependence(
        model, X, features=[feature1, feature2],
        grid_resolution=num_grid_points
    )
    pd_1 = partial_dependence(
        model, X, features=[feature1],
        grid_resolution=num_grid_points
    )
    pd_2 = partial_dependence(
        model, X, features=[feature2],
        grid_resolution=num_grid_points
    )
    
    # Compute H-statistic
    # Variance of joint PD minus sum of individual PD variances
    joint_var = np.var(pd_12['average'][0])
    sum_individual_var = np.var(pd_1['average'][0]) + np.var(pd_2['average'][0])
    
    if sum_individual_var == 0:
        return 0
    
    h_stat = (joint_var - sum_individual_var) / joint_var
    return max(0, h_stat)  # Clamp to [0, 1]
 
 
def tree_based_interaction_importance(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    n_top_interactions: int = 10
) -> pd.DataFrame:
    """
    Use tree feature importance to identify valuable interactions.
    """
    # Create all pairwise interactions
    X_with_ints = X[base_features].copy()
    interaction_cols = []
    
    for col1, col2 in combinations(base_features, 2):
        int_col = f'{col1}_x_{col2}'
        X_with_ints[int_col] = X[col1] * X[col2]
        interaction_cols.append(int_col)
    
    # Fit random forest
    rf = RandomForestRegressor(n_estimators=100, max_depth=10, random_state=42)
    rf.fit(X_with_ints, y)
    
    # Get importance for interaction features only
    importance_df = pd.DataFrame({
        'feature': X_with_ints.columns,
        'importance': rf.feature_importances_
    })
    
    # Filter to interactions and rank
    interactions = importance_df[importance_df['feature'].isin(interaction_cols)]
    return interactions.nlargest(n_top_interactions, 'importance')

Interaction Overfitting Risk

Polynomial and Non-Linear Interactions

Beyond simple multiplication, richer interaction types capture more complex relationships.

Polynomial Features:

Expanding to degree-2 polynomials includes:

Original features: $x_1, x_2, x_3$
Squared terms: $x_1^2, x_2^2, x_3^2$
Interactions: $x_1 x_2, x_1 x_3, x_2 x_3$

Degree-3 adds cubic terms and three-way interactions. The feature count grows rapidly: for n features and degree d, the count is $\binom{n+d}{d}$.

Original Features	Degree	Feature Count
10	2	66
10	3	286
50	2	1,326
50	3	23,426
100	2	5,151

Non-Multiplicative Interaction Types

•Ratios — X1 / X2 captures relative magnitude. Often more interpretable than products (e.g., price-to-earnings ratio, debt-to-income).
•Differences — X1 - X2 captures gap or change. Useful for before/after, prediction-actual, or competitive comparisons.
•Min/Max — min(X1, X2) or max(X1, X2) captures bottleneck or ceiling effects.
•Absolute Difference — |X1 - X2| captures disagreement magnitude regardless of direction.
•Geometric Mean — sqrt(X1 × X2) is less sensitive to outliers than arithmetic product.
•Boolean Logic — (X1 > threshold) AND (X2 > threshold) creates binary indicators for joint conditions.

advanced_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
import pandas as pd
import numpy as np
 
def create_advanced_interactions(
    df: pd.DataFrame,
    col1: str,
    col2: str
) -> pd.DataFrame:
    """
    Create a comprehensive set of interaction features between two numeric columns.
    """
    int_df = pd.DataFrame(index=df.index)
    
    x1, x2 = df[col1], df[col2]
    
    # Basic multiplicative
    int_df[f'{col1}_x_{col2}'] = x1 * x2
    
    # Ratios (with zero protection)
    eps = 1e-8
    int_df[f'{col1}_div_{col2}'] = x1 / (x2 + eps)
    int_df[f'{col2}_div_{col1}'] = x2 / (x1 + eps)
    
    # Differences
    int_df[f'{col1}_minus_{col2}'] = x1 - x2
    int_df[f'abs_diff_{col1}_{col2}'] = np.abs(x1 - x2)
    
    # Min/Max (bottleneck/ceiling)
    int_df[f'min_{col1}_{col2}'] = np.minimum(x1, x2)
    int_df[f'max_{col1}_{col2}'] = np.maximum(x1, x2)
    
    # Geometric and harmonic means
    int_df[f'geom_mean_{col1}_{col2}'] = np.sqrt(np.abs(x1 * x2)) * np.sign(x1 * x2)
    int_df[f'harm_mean_{col1}_{col2}'] = 2 * x1 * x2 / (x1 + x2 + eps)
    
    # Sum and average (sometimes useful for ensemble-like effects)
    int_df[f'sum_{col1}_{col2}'] = x1 + x2
    int_df[f'avg_{col1}_{col2}'] = (x1 + x2) / 2
    
    # Relative position (where is col1 relative to col2)
    int_df[f'{col1}_pct_of_{col2}'] = x1 / (x1 + x2 + eps)
    
    # Squared difference (emphasizes large gaps)
    int_df[f'sq_diff_{col1}_{col2}'] = (x1 - x2) ** 2
    
    # Log of product (if positive)
    positive_mask = (x1 > 0) & (x2 > 0)
    int_df[f'log_product_{col1}_{col2}'] = np.where(
        positive_mask,
        np.log(x1 + eps) + np.log(x2 + eps),
        np.nan
    )
    
    return int_df
 
 
def create_threshold_interactions(
    df: pd.DataFrame,
    num_col: str,
    threshold_col: str,
    thresholds: list = None
) -> pd.DataFrame:
    """
    Create threshold-based binary interactions.
    """
    int_df = pd.DataFrame(index=df.index)
    
    if thresholds is None:
        # Use quartiles as default thresholds
        thresholds = df[threshold_col].quantile([0.25, 0.5, 0.75]).tolist()
    
    for thresh in thresholds:
        thresh_name = f'{threshold_col}_gt_{thresh:.2f}'.replace('.', 'p')
        above_thresh = (df[threshold_col] > thresh).astype(int)
        
        # Numerical value when above threshold, 0 otherwise
        int_df[f'{num_col}_when_{thresh_name}'] = df[num_col] * above_thresh
        
        # Binary: both conditions met
        int_df[f'{num_col}_high_and_{thresh_name}'] = (
            (df[num_col] > df[num_col].median()) & (df[threshold_col] > thresh)
        ).astype(int)
    
    return int_df
 
 
# Example: Financial ratio interactions
def financial_ratio_features(df: pd.DataFrame) -> pd.DataFrame:
    """
    Domain-specific interaction features for financial data.
    """
    ratios = pd.DataFrame(index=df.index)
    
    # Profitability ratios
    ratios['profit_margin'] = df['net_income'] / (df['revenue'] + 1)
    ratios['roa'] = df['net_income'] / (df['total_assets'] + 1)
    ratios['roe'] = df['net_income'] / (df['equity'] + 1)
    
    # Leverage ratios
    ratios['debt_to_equity'] = df['total_debt'] / (df['equity'] + 1)
    ratios['debt_to_assets'] = df['total_debt'] / (df['total_assets'] + 1)
    
    # Efficiency ratios
    ratios['asset_turnover'] = df['revenue'] / (df['total_assets'] + 1)
    ratios['inventory_turnover'] = df['cogs'] / (df['inventory'] + 1)
    
    # Liquidity
    ratios['current_ratio'] = df['current_assets'] / (df['current_liabilities'] + 1)
    ratios['quick_ratio'] = (df['current_assets'] - df['inventory']) / (df['current_liabilities'] + 1)
    
    # DuPont decomposition (ROE = margin × turnover × leverage)
    ratios['dupont_leverage'] = df['total_assets'] / (df['equity'] + 1)
    
    return ratios

Managing the Combinatorial Explosion

With 100 features, there are 4,950 pairwise interactions. Include three-way interactions and you have 161,700 features. This explosion causes problems:

Memory constraints: Can't fit in RAM
Overfitting: More features than samples
Noise: Most interactions are uninformative
Multicollinearity: Interactions correlate with base features

Strategies for Taming the Explosion:

Interaction Management Strategies

•Domain-Guided Selection — Only create interactions with domain justification. 10 thoughtful interactions beat 10,000 random ones.
•Univariate Pre-Filtering — Only interact features that are individually predictive. Low-importance features rarely create important interactions.
•Staged Addition — Start with no interactions, then iteratively add the most promising ones based on validation performance.
•Regularization — Use L1 (Lasso) or Elastic Net to automatically shrink uninformative interactions to zero.
•Feature Hashing — Hash interaction identifiers to fixed-size vector. Trades exactness for memory efficiency.
•Factorization Machines — Learn interaction weights implicitly through latent factor multiplication, avoiding explicit feature creation.
•Interaction Networks (Neural) — Let neural network layers discover useful combinations implicitly.

managing_interactions.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
import pandas as pd
import numpy as np
from sklearn.feature_selection import SelectKBest, f_regression
from sklearn.linear_model import LassoCV
from itertools import combinations
 
def importance_filtered_interactions(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    importance_threshold: float = 0.01,
    max_interactions: int = 100
) -> pd.DataFrame:
    """
    Only create interactions between features that are individually important.
    """
    from sklearn.ensemble import RandomForestRegressor
    
    # Get feature importance
    rf = RandomForestRegressor(n_estimators=50, max_depth=6, random_state=42)
    rf.fit(X[base_features], y)
    
    importance = pd.Series(rf.feature_importances_, index=base_features)
    important_features = importance[importance > importance_threshold].index.tolist()
    
    print(f"Kept {len(important_features)} of {len(base_features)} features for interactions")
    
    # Create interactions only among important features
    interactions = pd.DataFrame(index=X.index)
    
    for col1, col2 in combinations(important_features, 2):
        if len(interactions.columns) >= max_interactions:
            break
        interactions[f'{col1}_x_{col2}'] = X[col1] * X[col2]
    
    return interactions
 
 
def lasso_selected_interactions(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    alpha_range: tuple = (0.001, 1.0)
) -> list:
    """
    Use Lasso to automatically select useful interactions.
    Returns list of interaction feature names with non-zero coefficients.
    """
    # Create all pairwise interactions
    X_full = X[base_features].copy()
    
    for col1, col2 in combinations(base_features, 2):
        X_full[f'{col1}_x_{col2}'] = X[col1] * X[col2]
    
    # Standardize for fair regularization
    X_scaled = (X_full - X_full.mean()) / (X_full.std() + 1e-8)
    
    # Cross-validated Lasso
    lasso = LassoCV(cv=5, random_state=42)
    lasso.fit(X_scaled, y)
    
    # Get selected features (non-zero coefficients)
    selected = X_full.columns[lasso.coef_ != 0].tolist()
    
    # Filter to only interaction features
    interaction_features = [f for f in selected if '_x_' in f]
    
    print(f"Lasso selected {len(interaction_features)} interactions")
    print(f"Best alpha: {lasso.alpha_:.4f}")
    
    return interaction_features
 
 
def staged_interaction_addition(
    X: pd.DataFrame,
    y: pd.Series,
    base_features: list,
    candidate_interactions: list,
    validation_metric: callable,
    max_to_add: int = 20
) -> list:
    """
    Greedily add interactions one at a time if they improve validation score.
    """
    from sklearn.model_selection import cross_val_score
    from sklearn.ensemble import GradientBoostingRegressor
    
    current_features = base_features.copy()
    added_interactions = []
    
    base_model = GradientBoostingRegressor(n_estimators=50, max_depth=3, random_state=42)
    best_score = cross_val_score(base_model, X[current_features], y, cv=3).mean()
    
    print(f"Baseline score: {best_score:.4f}")
    
    for interaction in candidate_interactions:
        if len(added_interactions) >= max_to_add:
            break
        
        col1, col2 = interaction
        int_name = f'{col1}_x_{col2}'
        X[int_name] = X[col1] * X[col2]
        
        trial_features = current_features + [int_name]
        score = cross_val_score(base_model, X[trial_features], y, cv=3).mean()
        
        if score > best_score + 0.001:  # Require meaningful improvement
            current_features.append(int_name)
            added_interactions.append(int_name)
            best_score = score
            print(f"Added {int_name}, score: {score:.4f}")
        else:
            X.drop(columns=[int_name], inplace=True)
    
    print(f"Final score: {best_score:.4f} with {len(added_interactions)} interactions")
    return added_interactions

The Factorization Machine Approach

Interaction Features in Practice

Let's see how interaction engineering works in a realistic scenario.

Case Study: Click-Through Rate Prediction

Predicting ad clicks involves user features, ad features, and contextual features. Key interactions:

CTR Interaction Features
Interaction	Rationale	Expected Effect
`user_age × ad_category`	Age groups respond differently to product categories	Fashion ads click better with younger users
`time_of_day × device_type`	Mobile usage peaks at commute times, desktop at work hours	Captures device-time-specific engagement patterns
`user_ctr_history × ad_position`	Engaged users click even in lower positions	Position sensitivity varies by user type
`ad_price × user_income_proxy`	Price-sensitive users respond differently to premium products	Match ad pricing to user willingness-to-pay
`query_ad_similarity × ad_freshness`	Relevance matters more for new, untested ads	Fresh ads need higher relevance to earn clicks

Production Considerations:

Feature computation latency: Interactions must compute in real-time for serving. Pre-compute where possible.
Feature distribution shift: Monitor interaction feature distributions over time. User behavior changes can break learned interaction patterns.
Interpretability burden: Each interaction adds complexity. Document why each interaction was added and its expected direction.
Numerical stability: Division-based interactions need zero/null handling. Multiplication can overflow with large values.
Cardinality with categoricals: Categorical interactions can explode cardinality. Hash or limit to top-k combinations.

Summary: Capturing Non-Additive Relationships

Key Takeaways

•Interactions capture conditional relationships where one feature modifies another's effect—essential for non-additive phenomena.
•Linear models require explicit interactions; trees can learn them but may need many splits. Explicit features help all model types.
•Interaction types include multiplicative, ratios, differences, min/max, and boolean combinations—each captures different relationship shapes.
•Domain knowledge guides interaction selection more reliably than exhaustive search. Hypothesize, then validate.
•Statistical detection methods include residual correlation, Friedman's H-statistic, and tree-based importance ranking.
•Managing explosion requires filtering, regularization, staged addition, or implicit methods like Factorization Machines.
•Production systems need real-time computation, distribution monitoring, and clear documentation for each interaction.

Page Complete

3 / 5