Loading problem...
In probabilistic machine learning, Bayesian inference over functions offers a powerful paradigm for making predictions while quantifying uncertainty. Unlike parametric models that learn fixed parameters, this approach maintains a distribution over all possible functions that could explain the observed data.
The Bayesian Function Predictor is a non-parametric model that defines a prior distribution over functions using only two key components: a mean function (typically assumed to be zero) and a covariance function (also called a kernel). The kernel encodes our assumptions about the function's smoothness, periodicity, and other structural properties.
Given training data points ((X_{train}, y_{train})) and test points (X_{test}), the predictor operates as follows:
The kernel (k(x_i, x_j)) measures the similarity between any two input points. For a linear kernel:
$$k(x_i, x_j) = \sigma_b^2 + \sigma_v^2 \cdot (x_i^T \cdot x_j)$$
Where:
Construct three covariance matrices:
The predicted mean at test points is computed as:
$$\mu_{test} = K_{test,train} \cdot K_y^{-1} \cdot y_{train}$$
This formula elegantly combines the kernel similarity structure with the observed target values to produce predictions that interpolate smoothly through the training data.
Implement the BayesianFunctionPredictor class with the following methods:
__init__(kernel, kernel_params, noise): Initialize the predictor with the specified kernel type, its hyperparameters, and observation noise variance.
fit(X_train, y_train): Condition the prior on the observed training data by computing and storing necessary matrices.
predict(X_test): Return the posterior mean predictions for the test points, formatted to 4 decimal places.
Note: Your implementation should handle multi-dimensional input features and return predictions as a NumPy array.
X_train = [[1.0], [2.0], [4.0]]
y_train = [3.0, 5.0, 9.0]
X_test = [[3.0]]
kernel = "linear"
kernel_params = {"sigma_b": 0.0, "sigma_v": 1.0}
noise = 1e-87.0000The training data follows a perfect linear relationship: y = 2x + 1.
With x = 1 → y = 3, x = 2 → y = 5, x = 4 → y = 9.
Using a linear kernel with sigma_b = 0.0 (no bias variance) and sigma_v = 1.0 (unit slope variance), the predictor effectively learns this linear function.
For x_test = 3.0, the prediction is:
The extremely small noise (1e-8) ensures the model passes exactly through the training points, enabling precise interpolation of the underlying linear function.
X_train = [[0.0], [1.0], [2.0], [3.0]]
y_train = [0.0, 2.0, 4.0, 6.0]
X_test = [[0.5], [1.5], [2.5]]
kernel = "linear"
kernel_params = {"sigma_b": 0.0, "sigma_v": 1.0}
noise = 1e-6["1.0000", "3.0000", "5.0000"]The training data represents y = 2x (a perfect linear function through the origin).
The predictor with a linear kernel learns this relationship and makes predictions at intermediate points:
Since the data is perfectly linear and noise is minimal, predictions at any point along this line are highly accurate.
X_train = [[1.0, 0.0], [0.0, 1.0], [1.0, 1.0]]
y_train = [1.0, 1.0, 2.0]
X_test = [[0.5, 0.5]]
kernel = "linear"
kernel_params = {"sigma_b": 0.0, "sigma_v": 1.0}
noise = 1e-61.0000This example demonstrates multi-dimensional input features. The training data suggests y = x₁ + x₂:
For the test point (0.5, 0.5):
This shows the predictor correctly captures the additive linear relationship across multiple input dimensions.
Constraints