Automated Model Selection - Learning Module

Loading content...

0/278

Meta-Learning for Selection

Learning to Learn Which Algorithm Works Best

Imagine you've solved hundreds of machine learning problems over your career. Each time you face a new dataset, you don't start from scratch—you draw on your accumulated experience. "This looks like a high-dimensional sparse dataset... regularized linear models usually work well here." "The classes are heavily imbalanced... I should try cost-sensitive learning or sampling techniques."

This is meta-learning in action: learning about learning. But humans accumulate this knowledge slowly, over years. Can machines do it systematically and at scale?

Meta-learning for algorithm selection is precisely this: training systems to predict which learning algorithms will perform best on new datasets, based on patterns learned from thousands of prior experiments. It's the key to making CASH practical—instead of searching from scratch, we start with informed guesses from meta-learning.

Meta-learning powers the warm-starting mechanisms in Auto-sklearn, dramatically reduces the optimization budget needed for good results, and embodies the principle that past experience should accelerate future learning.

This page explores how meta-learning works for algorithm selection: the data it needs, the models it builds, and how it's integrated into production AutoML systems.

What You Will Learn

By the end of this page, you will understand how to build meta-learning systems for algorithm selection, the role of meta-databases, approaches like instance-based and model-based meta-learning, theoretical foundations of transfer across tasks, and practical implementations used in state-of-the-art AutoML.

The Meta-Learning Framework for Algorithm Selection

Meta-learning for algorithm selection operates at a level above individual learning tasks. Instead of learning to predict labels from features, we learn to predict algorithm performance from dataset characteristics.

The Meta-Learning Setup:

Base-level (Object-level):

Input: Dataset D with features X and labels y
Output: Predictions for new instances
Model: A learning algorithm A (e.g., Random Forest, SVM)

Meta-level:

Input: Dataset characteristics (meta-features) f(D)
Output: Performance predictions p(A, D) for each algorithm A
Model: A meta-learner M that learns across many datasets

The meta-learner M is trained on historical data: (meta-features, algorithm, performance) triplets from past experiments. Once trained, it predicts which algorithm will work best on a new dataset based only on that dataset's meta-features.

Formal Definition:

Given:

A meta-database M = {(D₁, A₁, p₁), (D₂, A₂, p₂), ..., (Dₙ, Aₙ, pₙ)} of past experiments
A meta-feature extractor f: Dataset → ℝᵈ

Train a meta-model: ĝ(A, f(D)) ≈ p(A, D) = E[performance of A on D]

For a new dataset D', predict: A = argmax_A ĝ(A, f(D'))*

meta_learning_framework.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
import numpy as np
from typing import Dict, List, Tuple, Any
from sklearn.preprocessing import StandardScaler
from dataclasses import dataclass
 
@dataclass
class MetaExperiment:
    """A single experiment in the meta-database."""
    dataset_id: str
    meta_features: np.ndarray
    algorithm_name: str
    hyperparameters: Dict[str, Any]
    performance: float
    training_time: float
 
class MetaDatabase:
    """
    Meta-database storing historical algorithm performance data.
    
    This is the foundation of meta-learning: a structured repository
    of past experiments that the meta-learner trains on.
    """
    
    def __init__(self):
        self.experiments: List[MetaExperiment] = []
        self.dataset_meta_features: Dict[str, np.ndarray] = {}
        self.algorithm_names: List[str] = []
        
    def add_experiment(self, experiment: MetaExperiment):
        """Add a new experiment to the database."""
        self.experiments.append(experiment)
        self.dataset_meta_features[experiment.dataset_id] = experiment.meta_features
        
        if experiment.algorithm_name not in self.algorithm_names:
            self.algorithm_names.append(experiment.algorithm_name)
    
    def add_batch(self, dataset_id: str, meta_features: np.ndarray,
                  results: Dict[str, Tuple[float, float]]):
        """
        Add a batch of results for one dataset.
        
        Parameters:
            dataset_id: Unique dataset identifier
            meta_features: Meta-features of the dataset
            results: Dict mapping algorithm names to (performance, time) tuples
        """
        for algo_name, (performance, train_time) in results.items():
            exp = MetaExperiment(
                dataset_id=dataset_id,
                meta_features=meta_features,
                algorithm_name=algo_name,
                hyperparameters={},  # Could be expanded
                performance=performance,
                training_time=train_time
            )
            self.add_experiment(exp)
    
    def get_performance_matrix(self) -> Tuple[np.ndarray, List[str], List[str]]:
        """
        Build the dataset × algorithm performance matrix.
        
        Returns:
            performance_matrix: (n_datasets, n_algorithms) array
            dataset_ids: Ordered list of dataset IDs
            algorithm_names: Ordered list of algorithm names
        """
        dataset_ids = list(self.dataset_meta_features.keys())
        algo_names = self.algorithm_names
        
        n_datasets = len(dataset_ids)
        n_algos = len(algo_names)
        
        matrix = np.full((n_datasets, n_algos), np.nan)
        
        for exp in self.experiments:
            d_idx = dataset_ids.index(exp.dataset_id)
            a_idx = algo_names.index(exp.algorithm_name)
            matrix[d_idx, a_idx] = exp.performance
        
        return matrix, dataset_ids, algo_names
    
    def get_meta_feature_matrix(self) -> Tuple[np.ndarray, List[str]]:
        """
        Build the meta-feature matrix.
        
        Returns:
            meta_features: (n_datasets, n_meta_features) array
            dataset_ids: Ordered list of dataset IDs
        """
        dataset_ids = list(self.dataset_meta_features.keys())
        meta_features = np.array([
            self.dataset_meta_features[did] for did in dataset_ids
        ])
        return meta_features, dataset_ids
    
    def summary(self):
        """Print summary statistics."""
        perf_matrix, _, _ = self.get_performance_matrix()
        
        print("=== Meta-Database Summary ===")
        print(f"Datasets: {len(self.dataset_meta_features)}")
        print(f"Algorithms: {len(self.algorithm_names)}")
        print(f"Total experiments: {len(self.experiments)}")
        print(f"Coverage: {(~np.isnan(perf_matrix)).mean()*100:.1f}%")
        print(f"Algorithms: {self.algorithm_names}")

Building a Meta-Database

OpenML (openml.org) provides a massive public meta-database of experiments on thousands of datasets with dozens of algorithms. Auto-sklearn ships with meta-knowledge from 140+ OpenML datasets, enabling immediate warm-starting without building your own meta-database.

Instance-Based Meta-Learning (Similarity-Based)

The simplest and often most effective meta-learning approach is instance-based or similarity-based: find datasets similar to the new one and recommend algorithms that worked well on those similar datasets.

The Algorithm:

Extract meta-features f(D_new) from the new dataset
Compute distance to all datasets in the meta-database
Select the k most similar datasets (k-nearest neighbors)
Retrieve the best algorithm configurations from those k datasets
Use these as recommendations or warm-starts

Why Instance-Based Works:

Local patterns: Algorithm performance often varies smoothly with dataset characteristics
No model needed: Just distance computation, which is fast and interpretable
Robustness: Doesn't require fitting a meta-model that might overfit or misspecify
Transparency: You can inspect which similar datasets informed the recommendation

Distance Metric Choices:

The distance metric significantly impacts performance:

L1 (Manhattan) distance: Used by Auto-sklearn; robust to outliers in meta-features
L2 (Euclidean) distance: Standard choice; sensitive to scale
Mahalanobis distance: Accounts for meta-feature correlations
Learned distance: Train a metric learning model to optimize for algorithm performance prediction

instance_based_meta_learning.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
import numpy as np
from sklearn.neighbors import NearestNeighbors
from sklearn.preprocessing import StandardScaler
from typing import List, Dict, Tuple, Optional
 
class InstanceBasedMetaLearner:
    """
    Instance-based (k-NN) meta-learning for algorithm selection.
    
    This is the approach used by Auto-sklearn for warm-starting.
    Given a new dataset, find similar past datasets and recommend
    algorithms that worked well on them.
    """
    
    def __init__(self, k: int = 25, metric: str = 'manhattan'):
        """
        Parameters:
            k: Number of similar datasets to consider
            metric: Distance metric ('euclidean', 'manhattan', 'cosine')
        """
        self.k = k
        self.metric = metric
        self.scaler = StandardScaler()
        self.knn = NearestNeighbors(n_neighbors=k, metric=metric)
        
        self.meta_features = None
        self.dataset_ids = None
        self.best_configs = None  # Best config per dataset
        
    def fit(self, meta_database: 'MetaDatabase'):
        """
        Build the meta-learner from historical data.
        
        Parameters:
            meta_database: MetaDatabase containing past experiments
        """
        # Get meta-features
        meta_features, dataset_ids = meta_database.get_meta_feature_matrix()
        
        # Normalize meta-features (critical for distance-based methods)
        self.meta_features = self.scaler.fit_transform(meta_features)
        self.dataset_ids = dataset_ids
        
        # Fit k-NN
        self.knn.fit(self.meta_features)
        
        # Store best configuration per dataset
        perf_matrix, _, algo_names = meta_database.get_performance_matrix()
        self.algo_names = algo_names
        
        # For each dataset, find best algorithm
        self.best_configs = {}
        for i, did in enumerate(dataset_ids):
            if not np.all(np.isnan(perf_matrix[i])):
                best_algo_idx = np.nanargmax(perf_matrix[i])
                self.best_configs[did] = {
                    'algorithm': algo_names[best_algo_idx],
                    'performance': perf_matrix[i, best_algo_idx]
                }
        
        # Also store full performance matrix for advanced queries
        self.perf_matrix = perf_matrix
        
    def recommend(self, new_meta_features: np.ndarray, 
                  n_recommendations: int = 5) -> List[Dict]:
        """
        Recommend algorithms for a new dataset.
        
        Parameters:
            new_meta_features: Meta-features of the new dataset
            n_recommendations: Number of configurations to recommend
            
        Returns:
            List of recommended configurations with expected performance
        """
        # Normalize
        mf_scaled = self.scaler.transform(new_meta_features.reshape(1, -1))
        
        # Find k most similar datasets
        distances, indices = self.knn.kneighbors(mf_scaled)
        
        # Collect recommendations from similar datasets
        recommendations = []
        seen_algorithms = set()
        
        for idx, dist in zip(indices[0], distances[0]):
            dataset_id = self.dataset_ids[idx]
            
            if dataset_id in self.best_configs:
                config = self.best_configs[dataset_id]
                algo = config['algorithm']
                
                if algo not in seen_algorithms:
                    recommendations.append({
                        'algorithm': algo,
                        'expected_performance': config['performance'],
                        'source_dataset': dataset_id,
                        'distance': dist
                    })
                    seen_algorithms.add(algo)
        
        # Sort by expected performance
        recommendations.sort(key=lambda x: x['expected_performance'], reverse=True)
        
        return recommendations[:n_recommendations]
    
    def get_algorithm_ranking(self, new_meta_features: np.ndarray) -> List[Tuple[str, float]]:
        """
        Get a complete ranking of algorithms for the new dataset.
        
        Uses weighted voting from k nearest neighbors.
        """
        # Normalize
        mf_scaled = self.scaler.transform(new_meta_features.reshape(1, -1))
        
        # Find k most similar datasets
        distances, indices = self.knn.kneighbors(mf_scaled)
        
        # Compute distance-weighted performance per algorithm
        weights = 1.0 / (distances[0] + 1e-10)  # Inverse distance weighting
        weights /= weights.sum()
        
        algo_scores = {}
        for algo_idx, algo_name in enumerate(self.algo_names):
            weighted_sum = 0
            weight_sum = 0
            
            for neighbor_idx, weight in zip(indices[0], weights):
                perf = self.perf_matrix[neighbor_idx, algo_idx]
                if not np.isnan(perf):
                    weighted_sum += weight * perf
                    weight_sum += weight
            
            if weight_sum > 0:
                algo_scores[algo_name] = weighted_sum / weight_sum
        
        # Sort by score
        ranking = sorted(algo_scores.items(), key=lambda x: x[1], reverse=True)
        return ranking

Auto-sklearn's Meta-Learning Implementation:

Auto-sklearn uses the following approach:

Meta-feature extraction: 46 meta-features including simple statistics, information-theoretic measures, and landmarking features
Distance computation: L1 distance on scaled meta-features
k = 25 neighbors: Retrieve configurations from the 25 most similar datasets
Warm-start count: Use best configurations from these neighbors as initial SMAC starting points

This warm-starting typically finds near-optimal performance in the first few configurations, rather than requiring hundreds of random trials.

Meta-Feature Robustness

Instance-based meta-learning is sensitive to the quality of meta-features. If a meta-feature is noisy or irrelevant, it adds noise to distances. Feature selection on meta-features themselves (meta-feature selection!) can improve performance.

Model-Based Meta-Learning

Model-based meta-learning trains explicit models to predict algorithm performance from meta-features. Instead of just comparing distances, we learn a function that generalizes patterns across the meta-database.

Approaches:

1. Algorithm Performance Regression

Train a regression model to predict:

Input: (meta-features, algorithm encoding)
Output: Expected performance

This allows predicting performance for all algorithms on a new dataset, then selecting the best.

2. Algorithm Ranking

Train a model to predict pairwise preferences:

Input: (meta-features, algorithm_A, algorithm_B)
Output: Probability that A outperforms B

Aggregate pairwise predictions into a complete ranking.

3. Best Algorithm Classification

Train a classifier:

Input: meta-features
Output: Best algorithm (class label)

Simplest approach, but ignores performance margins and may have class imbalance.

model_based_meta_learning.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
import numpy as np
from sklearn.ensemble import RandomForestRegressor, RandomForestClassifier
from sklearn.preprocessing import StandardScaler, LabelEncoder
from sklearn.model_selection import cross_val_score
from typing import List, Dict, Tuple
 
class ModelBasedMetaLearner:
    """
    Model-based meta-learning for algorithm selection.
    
    Trains a model to predict algorithm performance from meta-features,
    enabling generalization beyond the k-NN similarity approach.
    """
    
    def __init__(self, model_type: str = 'regression'):
        """
        Parameters:
            model_type: 'regression' (predict performance), 
                       'classification' (predict best algorithm),
                       'ranking' (predict pairwise preferences)
        """
        self.model_type = model_type
        self.scaler = StandardScaler()
        
        if model_type == 'regression':
            self.model = RandomForestRegressor(n_estimators=100, random_state=42)
        elif model_type == 'classification':
            self.model = RandomForestClassifier(n_estimators=100, random_state=42)
        else:
            raise ValueError(f"Unknown model type: {model_type}")
        
        self.algo_encoder = LabelEncoder()
        
    def fit(self, meta_database: 'MetaDatabase'):
        """
        Train the meta-learner on historical experiments.
        """
        meta_features, dataset_ids = meta_database.get_meta_feature_matrix()
        perf_matrix, _, algo_names = meta_database.get_performance_matrix()
        
        self.algo_names = algo_names
        self.algo_encoder.fit(algo_names)
        
        if self.model_type == 'regression':
            self._fit_regression(meta_features, perf_matrix)
        elif self.model_type == 'classification':
            self._fit_classification(meta_features, perf_matrix)
    
    def _fit_regression(self, meta_features: np.ndarray, 
                        perf_matrix: np.ndarray):
        """
        Fit regression model: predict performance for (dataset, algorithm) pairs.
        """
        X_list = []
        y_list = []
        
        n_datasets, n_algos = perf_matrix.shape
        
        for d_idx in range(n_datasets):
            for a_idx in range(n_algos):
                if not np.isnan(perf_matrix[d_idx, a_idx]):
                    # Concatenate meta-features with algorithm encoding
                    algo_one_hot = np.zeros(n_algos)
                    algo_one_hot[a_idx] = 1
                    
                    x = np.concatenate([meta_features[d_idx], algo_one_hot])
                    y = perf_matrix[d_idx, a_idx]
                    
                    X_list.append(x)
                    y_list.append(y)
        
        X = np.array(X_list)
        y = np.array(y_list)
        
        # Scale meta-features (not algorithm encoding)
        n_mf = meta_features.shape[1]
        X[:, :n_mf] = self.scaler.fit_transform(X[:, :n_mf])
        
        self.n_meta_features = n_mf
        self.model.fit(X, y)
        
    def _fit_classification(self, meta_features: np.ndarray,
                           perf_matrix: np.ndarray):
        """
        Fit classification model: predict best algorithm per dataset.
        """
        # For each dataset, find best algorithm
        best_algos = []
        valid_indices = []
        
        for d_idx in range(len(perf_matrix)):
            if not np.all(np.isnan(perf_matrix[d_idx])):
                best_algo_idx = np.nanargmax(perf_matrix[d_idx])
                best_algos.append(self.algo_names[best_algo_idx])
                valid_indices.append(d_idx)
        
        X = self.scaler.fit_transform(meta_features[valid_indices])
        y = self.algo_encoder.transform(best_algos)
        
        self.n_meta_features = meta_features.shape[1]
        self.model.fit(X, y)
        
    def predict(self, new_meta_features: np.ndarray) -> Dict[str, float]:
        """
        Predict algorithm performances for a new dataset.
        
        Returns:
            Dict mapping algorithm names to predicted performances
        """
        if self.model_type == 'regression':
            return self._predict_regression(new_meta_features)
        elif self.model_type == 'classification':
            return self._predict_classification(new_meta_features)
    
    def _predict_regression(self, mf: np.ndarray) -> Dict[str, float]:
        """Predict performance for each algorithm."""
        mf_scaled = self.scaler.transform(mf.reshape(1, -1))
        
        predictions = {}
        n_algos = len(self.algo_names)
        
        for a_idx, algo_name in enumerate(self.algo_names):
            algo_one_hot = np.zeros(n_algos)
            algo_one_hot[a_idx] = 1
            
            x = np.concatenate([mf_scaled[0], algo_one_hot]).reshape(1, -1)
            predictions[algo_name] = self.model.predict(x)[0]
        
        return predictions
    
    def _predict_classification(self, mf: np.ndarray) -> Dict[str, float]:
        """Predict probability of each algorithm being best."""
        mf_scaled = self.scaler.transform(mf.reshape(1, -1))
        
        probas = self.model.predict_proba(mf_scaled)[0]
        predictions = {}
        
        for prob, algo_name in zip(probas, self.algo_encoder.classes_):
            predictions[algo_name] = prob
        
        return predictions
    
    def recommend(self, new_meta_features: np.ndarray,
                  n_recommendations: int = 5) -> List[Tuple[str, float]]:
        """
        Get top-n algorithm recommendations.
        """
        predictions = self.predict(new_meta_features)
        ranking = sorted(predictions.items(), key=lambda x: x[1], reverse=True)
        return ranking[:n_recommendations]
    
    def get_feature_importance(self) -> np.ndarray:
        """
        Get importance of meta-features in predicting algorithm performance.
        """
        importances = self.model.feature_importances_
        return importances[:self.n_meta_features]  # Exclude algorithm encoding

Comparison: Instance-Based vs Model-Based

Meta-Learning Approach Comparison
Aspect	Instance-Based (k-NN)	Model-Based
Generalization	Local only; fails for distant datasets	Can extrapolate if patterns generalize
Interpretability	High; show similar datasets	Medium; show feature importances
Robustness	Robust to model misspecification	Can overfit with limited meta-data
Scalability	O(n) per query without indexing	O(1) per query after training
New algorithms	Requires new experiments	Requires retraining (or few-shot transfer)

Hybrid Approaches

Production systems often combine both: use instance-based for warm-starting (fast, robust), then use model-based predictions to guide exploration in regions where similar datasets haven't been explored.

Transfer Learning and Multi-Task Meta-Learning

Meta-learning for algorithm selection is fundamentally about transfer learning across datasets. Knowledge gained from one dataset should accelerate learning on another. Let's formalize this.

The Transfer Learning Perspective:

Define a distribution P(D) over datasets. Each dataset D induces a performance function:

f_D: Algorithms → Performance

If datasets are drawn from the same distribution P(D), their performance functions f_D share structure. For example:

High-dimensional sparse datasets tend to favor regularized linear models
Small datasets favor simpler models to avoid overfitting
Class-imbalanced datasets require specific strategies

Meta-learning captures this shared structure through:

Meta-features: Explicit dataset description that correlates with algorithm performance
Latent factors: Learned embeddings that capture dataset 'type'
Performance patterns: Correlations in the algorithm performance matrix

Multi-Task Learning Formulation:

Treat algorithm performance prediction on each dataset as a separate task. Use multi-task learning to share information:

Each dataset is a task
Each algorithm's performance is an output
Meta-features are shared features
Multi-output regression with shared hidden layers

multi_task_meta_learning.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
import numpy as np
import torch
import torch.nn as nn
from torch.utils.data import DataLoader, TensorDataset
from typing import Tuple, Dict, List
 
class MultiTaskMetaLearner(nn.Module):
    """
    Multi-task neural network for algorithm performance prediction.
    
    Uses shared layers to capture common patterns in how meta-features
    relate to algorithm performance, with task-specific output heads.
    """
    
    def __init__(self, n_meta_features: int, n_algorithms: int,
                 hidden_dims: List[int] = [128, 64]):
        super().__init__()
        
        self.n_algorithms = n_algorithms
        
        # Shared layers (capture common patterns)
        layers = []
        prev_dim = n_meta_features
        for hidden_dim in hidden_dims:
            layers.extend([
                nn.Linear(prev_dim, hidden_dim),
                nn.ReLU(),
                nn.Dropout(0.2)
            ])
            prev_dim = hidden_dim
        self.shared = nn.Sequential(*layers)
        
        # Algorithm-specific heads (capture algorithm-specific patterns)
        self.heads = nn.ModuleList([
            nn.Sequential(
                nn.Linear(hidden_dims[-1], 32),
                nn.ReLU(),
                nn.Linear(32, 1)
            )
            for _ in range(n_algorithms)
        ])
        
    def forward(self, meta_features: torch.Tensor) -> torch.Tensor:
        """
        Forward pass: predict performance for all algorithms.
        
        Parameters:
            meta_features: (batch_size, n_meta_features) tensor
            
        Returns:
            (batch_size, n_algorithms) performance predictions
        """
        shared_repr = self.shared(meta_features)
        
        predictions = []
        for head in self.heads:
            pred = head(shared_repr)
            predictions.append(pred)
        
        return torch.cat(predictions, dim=1)
    
    def predict_single(self, meta_features: torch.Tensor,
                       algo_idx: int) -> torch.Tensor:
        """Predict for a specific algorithm."""
        shared_repr = self.shared(meta_features)
        return self.heads[algo_idx](shared_repr)
 
 
def train_multi_task_metalearner(meta_features: np.ndarray,
                                  performance_matrix: np.ndarray,
                                  n_epochs: int = 100,
                                  lr: float = 1e-3) -> MultiTaskMetaLearner:
    """
    Train the multi-task meta-learner.
    
    Parameters:
        meta_features: (n_datasets, n_meta_features) array
        performance_matrix: (n_datasets, n_algorithms) array with NaNs for missing
        n_epochs: Training epochs
        lr: Learning rate
        
    Returns:
        Trained model
    """
    n_datasets, n_mf = meta_features.shape
    n_algos = performance_matrix.shape[1]
    
    # Create model
    model = MultiTaskMetaLearner(n_mf, n_algos)
    optimizer = torch.optim.Adam(model.parameters(), lr=lr)
    
    # Convert to tensors
    X = torch.tensor(meta_features, dtype=torch.float32)
    Y = torch.tensor(performance_matrix, dtype=torch.float32)
    mask = ~torch.isnan(Y)  # Which entries are valid
    
    # Replace NaN with 0 for computation (masked out anyway)
    Y = torch.nan_to_num(Y, nan=0.0)
    
    model.train()
    for epoch in range(n_epochs):
        optimizer.zero_grad()
        
        predictions = model(X)
        
        # Masked MSE loss - only compute loss for observed entries
        loss = ((predictions - Y) ** 2 * mask).sum() / mask.sum()
        
        loss.backward()
        optimizer.step()
        
        if (epoch + 1) % 20 == 0:
            print(f"Epoch {epoch+1}/{n_epochs}, Loss: {loss.item():.4f}")
    
    model.eval()
    return model
 
 
def recommend_with_uncertainty(model: MultiTaskMetaLearner,
                                meta_features: np.ndarray,
                                n_samples: int = 50) -> Dict[int, Tuple[float, float]]:
    """
    Get recommendations with uncertainty estimates using MC dropout.
    
    Returns:
        Dict mapping algorithm index to (mean prediction, std) tuples
    """
    model.train()  # Enable dropout
    X = torch.tensor(meta_features, dtype=torch.float32).unsqueeze(0)
    
    predictions = []
    for _ in range(n_samples):
        with torch.no_grad():
            pred = model(X)
            predictions.append(pred.numpy())
    
    predictions = np.stack(predictions)
    
    results = {}
    for algo_idx in range(model.n_algorithms):
        algo_preds = predictions[:, 0, algo_idx]
        results[algo_idx] = (algo_preds.mean(), algo_preds.std())
    
    model.eval()
    return results

Theoretical Foundations:

When Does Meta-Learning Help?

Meta-learning provides benefit when:

Task similarity: Datasets share structure that affects algorithm performance
Meta-data availability: Sufficient experiments on similar datasets exist
Stable patterns: Algorithm rankings are consistent across similar datasets

Negative Transfer Risk:

Meta-learning can hurt when:

The new dataset is fundamentally different from the meta-database
The meta-database is biased toward certain problem types
Meta-features don't capture the relevant aspects of dataset structure

Theoretical Bounds:

Meta-learning sample complexity is related to:

Task diversity: More diverse tasks in the meta-database improve generalization
Representation quality: Better meta-features enable better transfer
Task relatedness: Provable benefits when tasks share structure

The Cold-Start Problem

Meta-learning for new algorithms is challenging—if a new algorithm isn't in the meta-database, we can't recommend it. Solutions include: algorithm meta-features (characterize algorithms themselves), multi-fidelity evaluation (cheap trials of new algorithms), and active meta-learning (strategically evaluate new algorithms on diverse datasets).

Building and Maintaining Meta-Databases

A meta-learning system is only as good as its meta-database. Let's discuss how to build and maintain effective meta-databases.

Data Collection Strategies:

1. Exhaustive Evaluation

Run all algorithms on all datasets:

Advantage: Complete performance matrix, no missing entries
Disadvantage: Computationally expensive (O(|A| × |D|) evaluations)
When to use: Initial meta-database construction, when compute is cheap

2. Active Meta-Learning

Strategically select which (dataset, algorithm) pairs to evaluate:

Use acquisition functions (like Bayesian optimization) at the meta-level
Prioritize evaluations that most reduce uncertainty about algorithm rankings
Balance exploration (try new algorithms) vs exploitation (evaluate promising ones)

3. Incremental Collection

Add experiments opportunistically:

Every time an AutoML system runs, add results to the meta-database
Crowdsourced experiments from OpenML and similar platforms
Production experiments from real deployments

meta_database_building.py
Python
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
import numpy as np
from sklearn.model_selection import cross_val_score
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.svm import SVC
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.neural_network import MLPClassifier
from typing import Dict, Callable, List
import time
 
class MetaDatabaseBuilder:
    """
    Systematic meta-database construction.
    
    Evaluates multiple algorithms on multiple datasets to build
    a comprehensive meta-database for meta-learning.
    """
    
    def __init__(self, algorithms: Dict[str, Callable] = None):
        """
        Parameters:
            algorithms: Dict mapping names to (estimator_factory, param_grid)
        """
        if algorithms is None:
            # Default algorithm portfolio
            self.algorithms = {
                'RandomForest': (lambda: RandomForestClassifier(n_estimators=100, random_state=42), {}),
                'GradientBoosting': (lambda: GradientBoostingClassifier(random_state=42), {}),
                'SVM_RBF': (lambda: SVC(kernel='rbf', random_state=42), {}),
                'SVM_Linear': (lambda: SVC(kernel='linear', random_state=42), {}),
                'LogisticRegression': (lambda: LogisticRegression(max_iter=200, random_state=42), {}),
                'KNN': (lambda: KNeighborsClassifier(), {}),
                'NeuralNetwork': (lambda: MLPClassifier(max_iter=500, random_state=42), {}),
            }
        else:
            self.algorithms = algorithms
        
        self.meta_database = MetaDatabase()
        self.meta_feature_extractor = None  # Set this
        
    def set_meta_feature_extractor(self, extractor: Callable):
        """Set the function that extracts meta-features from datasets."""
        self.meta_feature_extractor = extractor
        
    def evaluate_dataset(self, X: np.ndarray, y: np.ndarray,
                         dataset_id: str, cv: int = 5,
                         verbose: bool = True) -> Dict[str, float]:
        """
        Evaluate all algorithms on a single dataset.
        
        Parameters:
            X: Feature matrix
            y: Target vector
            dataset_id: Unique identifier for this dataset
            cv: Cross-validation folds
            verbose: Print progress
            
        Returns:
            Dict mapping algorithm names to CV accuracy
        """
        if self.meta_feature_extractor is None:
            raise ValueError("Set meta_feature_extractor first!")
        
        # Extract meta-features
        meta_features = self.meta_feature_extractor(X, y)
        
        # Evaluate each algorithm
        results = {}
        
        for algo_name, (estimator_factory, _) in self.algorithms.items():
            try:
                start_time = time.time()
                estimator = estimator_factory()
                scores = cross_val_score(estimator, X, y, cv=cv, scoring='accuracy')
                elapsed = time.time() - start_time
                
                results[algo_name] = (scores.mean(), elapsed)
                
                if verbose:
                    print(f"  {algo_name}: {scores.mean():.4f} (±{scores.std():.4f}) "
                          f"in {elapsed:.1f}s")
                    
            except Exception as e:
                if verbose:
                    print(f"  {algo_name}: FAILED ({e})")
                results[algo_name] = (np.nan, np.nan)
        
        # Add to meta-database
        self.meta_database.add_batch(dataset_id, meta_features, results)
        
        return {k: v[0] for k, v in results.items()}
    
    def evaluate_multiple_datasets(self, datasets: List[Dict],
                                   verbose: bool = True):
        """
        Evaluate algorithms across multiple datasets.
        
        Parameters:
            datasets: List of {'id': str, 'X': array, 'y': array} dicts
            verbose: Print progress
        """
        for i, dataset in enumerate(datasets):
            if verbose:
                print(f"
Dataset {i+1}/{len(datasets)}: {dataset['id']}")
            
            self.evaluate_dataset(
                X=dataset['X'],
                y=dataset['y'],
                dataset_id=dataset['id'],
                verbose=verbose
            )
        
        if verbose:
            print("
=== Evaluation Complete ===")
            self.meta_database.summary()
 
 
# Example: Build meta-database from OpenML datasets
def build_from_openml(n_datasets: int = 50):
    """
    Build a meta-database from OpenML benchmark datasets.
    """
    import openml
    from openml.tasks import TaskType
    
    # Get classification tasks
    tasks = openml.tasks.list_tasks(
        task_type=TaskType.SUPERVISED_CLASSIFICATION,
        output_format='dataframe'
    )
    
    # Filter sensible datasets
    tasks = tasks[
        (tasks['NumberOfInstances'] >= 100) &
        (tasks['NumberOfInstances'] <= 10000) &
        (tasks['NumberOfFeatures'] <= 100)
    ].head(n_datasets)
    
    builder = MetaDatabaseBuilder()
    
    # Define meta-feature extractor (simplified)
    def extract_mf(X, y):
        return np.array([
            X.shape[0],  # n_samples
            X.shape[1],  # n_features
            len(np.unique(y)),  # n_classes
            X.shape[1] / X.shape[0],  # dimensionality ratio
            np.std(X),  # feature std
        ])
    
    builder.set_meta_feature_extractor(extract_mf)
    
    datasets = []
    for task_id in tasks['tid']:
        try:
            task = openml.tasks.get_task(task_id)
            X, y = task.get_X_and_y()
            
            # Handle missing values
            X = np.nan_to_num(X)
            
            datasets.append({
                'id': f"openml_{task_id}",
                'X': X,
                'y': y
            })
        except Exception as e:
            print(f"Skipping task {task_id}: {e}")
    
    builder.evaluate_multiple_datasets(datasets)
    
    return builder.meta_database

Meta-Database Quality Metrics:

Coverage: What fraction of (dataset, algorithm) pairs are evaluated?
Diversity: Are datasets representative of the target domain?
Recency: Are algorithm versions up-to-date?
Consistency: Are evaluations done with consistent methodology?

Maintenance Challenges:

Algorithm evolution: Sklearn 0.24 SVM may differ from Sklearn 1.0 SVM
Hardware variability: Training times depend on hardware
Dataset privacy: Some production datasets can't be shared
Meta-feature drift: New dataset types may need new meta-features

Summary: Meta-Learning for Selection

We've explored how meta-learning enables intelligent algorithm selection by learning from past experiments. Let's consolidate the key takeaways:

Key Takeaways

•Meta-learning operates at the task level: Learn to predict algorithm performance from dataset characteristics, not labels from features.
•Instance-based approaches (k-NN on meta-features) are simple, robust, and effective for warm-starting optimization.
•Model-based approaches can generalize patterns but risk overfitting with limited meta-data.
•Multi-task learning captures shared structure across datasets for better transfer.
•Meta-databases are the foundation—their quality determines meta-learning effectiveness.

What's Next:

Meta-learning provides intelligent initialization for CASH optimization. The next page explores warm starting—how to leverage meta-learning predictions to dramatically accelerate hyperparameter optimization by starting from promising configurations rather than random initialization.

Page Complete

You now understand how meta-learning enables efficient algorithm selection by transferring knowledge from past experiments. This capability is what allows AutoML systems to find good configurations in minutes rather than hours—by starting from informed rather than random positions.