Exhaustive Hyperparameter Search (Medium) — Practice with Code Visualizer

In machine learning, the performance of a model is highly sensitive to its hyperparameters—configuration settings that are not learned from data but must be specified before training begins. Finding the optimal combination of hyperparameters is a critical step in the model development pipeline, often making the difference between a mediocre model and a state-of-the-art one.

Exhaustive Hyperparameter Search (commonly known as Grid Search) is a systematic approach to hyperparameter optimization. Given a set of hyperparameters and their candidate values, this technique constructs the complete Cartesian product of all possible combinations and evaluates each one to identify the best-performing configuration.

How It Works:

Consider a machine learning model with two hyperparameters:

learning_rate: [0.01, 0.1, 1.0]
max_depth: [5, 10]

The exhaustive search will evaluate all 6 combinations:

Combination	learning_rate	max_depth
1	0.01	5
2	0.01	10
3	0.1	5
4	0.1	10
5	1.0	5
6	1.0	10

For each combination, the method:

Trains the model on the training data using the current hyperparameter values
Generates predictions on the validation set
Computes a score using the specified scoring function
Tracks the best-performing configuration

Your Task:

Implement a function that performs exhaustive hyperparameter search. The function should:

Accept training data (X_train, y_train), validation data (X_val, y_val), a parameter grid dictionary, a model function, and a scoring function
Generate all possible combinations of hyperparameters from the parameter grid using the Cartesian product
For each combination: call the model function to get predictions, then evaluate using the scoring function
Return a tuple containing the best parameter dictionary and its corresponding score (rounded to 4 decimal places)

Function Signatures:

model_fn(X_train, y_train, X_val, **params) -> predictions
scoring_fn(y_true, y_pred) -> score  # Higher scores indicate better performance

Tie-Breaking Rule: When multiple parameter combinations achieve the same best score, return the first one encountered based on the natural iteration order through the Cartesian product of parameter values.

The search evaluates three configurations:

• k=1: Finds the 1 nearest neighbor to [1.5, 0]. The closest training point is [1, 0] with label 1. Prediction: 1, Accuracy: 1.0 (correct) • k=3: Finds the 3 nearest neighbors: [1, 0], [2, 0], [0, 0] with labels [1, 1, 0]. Majority vote: 1. Prediction: 1, Accuracy: 1.0 (correct) • k=5: Considers all 5 training points with labels [0, 1, 1, 0, 0]. Majority vote: 0. Prediction: 0, Accuracy: 0.0 (incorrect)

Both k=1 and k=3 achieve perfect accuracy of 1.0. Following the tie-breaking rule, k=1 (encountered first) is returned.

Evaluating k-NN with different neighborhood sizes:

• k=1: For [1.5], nearest is [1] or [2] → predicts 0 or 1; For [3.5], nearest is [3] or [4] → predicts 1 or 0. With proper distance calculation, achieves accuracy 1.0. • k=3: For [1.5], considers neighbors [1], [2], [0] with labels [0, 1, 0], majority 0 (correct). For [3.5], considers [3], [4], [5] with labels [1, 0, 1], majority 1 (correct). Accuracy: 1.0.

Both achieve 1.0 accuracy, but k=1 is returned as it was evaluated first.

This example uses linear regression with gradient descent, searching over learning rates and iteration counts.

The parameter grid creates 4 combinations:

alpha=0.01, iterations=50
alpha=0.01, iterations=100
alpha=0.1, iterations=50
alpha=0.1, iterations=100

The data follows approximately y = 2x, so the model should learn weights close to [0, 2]. Higher learning rates and more iterations allow better convergence.

The combination (alpha=0.1, iterations=100) achieves the best negative MSE of -0.0071, indicating predictions very close to the true values of [12.0, 14.0].