Loading problem...
In machine learning, the performance of a model is highly sensitive to its hyperparameters—configuration settings that are not learned from data but must be specified before training begins. Finding the optimal combination of hyperparameters is a critical step in the model development pipeline, often making the difference between a mediocre model and a state-of-the-art one.
Exhaustive Hyperparameter Search (commonly known as Grid Search) is a systematic approach to hyperparameter optimization. Given a set of hyperparameters and their candidate values, this technique constructs the complete Cartesian product of all possible combinations and evaluates each one to identify the best-performing configuration.
How It Works:
Consider a machine learning model with two hyperparameters:
learning_rate: [0.01, 0.1, 1.0]max_depth: [5, 10]The exhaustive search will evaluate all 6 combinations:
| Combination | learning_rate | max_depth |
|---|---|---|
| 1 | 0.01 | 5 |
| 2 | 0.01 | 10 |
| 3 | 0.1 | 5 |
| 4 | 0.1 | 10 |
| 5 | 1.0 | 5 |
| 6 | 1.0 | 10 |
For each combination, the method:
Your Task:
Implement a function that performs exhaustive hyperparameter search. The function should:
X_train, y_train), validation data (X_val, y_val), a parameter grid dictionary, a model function, and a scoring functionFunction Signatures:
model_fn(X_train, y_train, X_val, **params) -> predictions
scoring_fn(y_true, y_pred) -> score # Higher scores indicate better performance
Tie-Breaking Rule: When multiple parameter combinations achieve the same best score, return the first one encountered based on the natural iteration order through the Cartesian product of parameter values.
X_train = [[0, 0], [1, 0], [2, 0], [3, 0], [4, 0]]
y_train = [0, 1, 1, 0, 0]
X_val = [[1.5, 0]]
y_val = [1]
param_grid = {'k': [1, 3, 5]}
model_fn = knn_model
scoring_fn = accuracy({'k': 1}, 1.0)The search evaluates three configurations:
• k=1: Finds the 1 nearest neighbor to [1.5, 0]. The closest training point is [1, 0] with label 1. Prediction: 1, Accuracy: 1.0 (correct) • k=3: Finds the 3 nearest neighbors: [1, 0], [2, 0], [0, 0] with labels [1, 1, 0]. Majority vote: 1. Prediction: 1, Accuracy: 1.0 (correct) • k=5: Considers all 5 training points with labels [0, 1, 1, 0, 0]. Majority vote: 0. Prediction: 0, Accuracy: 0.0 (incorrect)
Both k=1 and k=3 achieve perfect accuracy of 1.0. Following the tie-breaking rule, k=1 (encountered first) is returned.
X_train = [[0], [1], [2], [3], [4], [5]]
y_train = [0, 0, 1, 1, 0, 1]
X_val = [[1.5], [3.5]]
y_val = [0, 1]
param_grid = {'k': [1, 3]}
model_fn = knn_model
scoring_fn = accuracy({'k': 1}, 1.0)Evaluating k-NN with different neighborhood sizes:
• k=1: For [1.5], nearest is [1] or [2] → predicts 0 or 1; For [3.5], nearest is [3] or [4] → predicts 1 or 0. With proper distance calculation, achieves accuracy 1.0. • k=3: For [1.5], considers neighbors [1], [2], [0] with labels [0, 1, 0], majority 0 (correct). For [3.5], considers [3], [4], [5] with labels [1, 0, 1], majority 1 (correct). Accuracy: 1.0.
Both achieve 1.0 accuracy, but k=1 is returned as it was evaluated first.
X_train = [[1], [2], [3], [4], [5]]
y_train = [2.1, 4.0, 5.9, 8.1, 10.0]
X_val = [[6], [7]]
y_val = [12.0, 14.0]
param_grid = {'alpha': [0.01, 0.1], 'iterations': [50, 100]}
model_fn = linear_regression
scoring_fn = negative_mse({'alpha': 0.1, 'iterations': 100}, -0.0071)This example uses linear regression with gradient descent, searching over learning rates and iteration counts.
The parameter grid creates 4 combinations:
The data follows approximately y = 2x, so the model should learn weights close to [0, 2]. Higher learning rates and more iterations allow better convergence.
The combination (alpha=0.1, iterations=100) achieves the best negative MSE of -0.0071, indicating predictions very close to the true values of [12.0, 14.0].
Constraints