Loading content...
In machine learning, probabilistic classifiers leverage the principles of Bayesian inference to make predictions based on observed data. One powerful approach specifically designed for binary feature spaces models the probability that each feature takes a value of 0 or 1 given the class label.
This classifier is particularly effective when dealing with presence/absence data, such as whether a word appears in a document, whether a user has performed a specific action, or whether a sensor has detected an event. The core idea is to learn the probability distribution of features for each class during training, then apply Bayes' theorem to compute the posterior probability of each class given new observations.
Mathematical Foundation:
For a sample with binary features x = [x₁, x₂, ..., xₙ], the classifier computes the posterior probability for each class c as:
$$P(c | x) \propto P(c) \cdot \prod_{i=1}^{n} P(x_i | c)$$
Where:
For binary features, the likelihood is modeled as: $$P(x_i = 1 | c) = \theta_{ic}$$ $$P(x_i = 0 | c) = 1 - \theta_{ic}$$
Laplace Smoothing:
To prevent zero probabilities when a feature value is never observed for a particular class in the training data, we apply Laplace smoothing (also known as additive smoothing):
$$\theta_{ic} = \frac{\text{count}(x_i = 1, y = c) + \alpha}{\text{count}(y = c) + 2\alpha}$$
Where α is the smoothing parameter (typically 1.0).
Numerical Stability:
To avoid underflow when multiplying many small probabilities, all computations should be performed in log-probability space:
$$\log P(c | x) = \log P(c) + \sum_{i=1}^{n} \log P(x_i | c)$$
Your Task:
Implement a Python class BinaryProbabilisticClassifier with the following methods:
forward(self, X, y): Train the model by computing class priors and feature probabilities from the training datapredict(self, X): Return predicted class labels (0 or 1) for the test samplesAdditionally, implement a solve function that creates a classifier instance, trains it, and returns predictions.
Requirements:
X_train = [[1, 0, 1], [1, 1, 0], [0, 0, 1], [0, 1, 0], [1, 1, 1]]
y_train = [1, 1, 0, 0, 1]
X_test = [[1, 0, 1]]
smoothing = 1.0[1]Training Phase: The model learns from 5 training samples with 3 binary features each.
Class Priors (with Laplace smoothing):
Feature Probabilities for Class 0 (samples: [0,0,1], [0,1,0]):
Feature Probabilities for Class 1 (samples: [1,0,1], [1,1,0], [1,1,1]):
Prediction for [1, 0, 1]:
Since -2.22 > -4.09, the model predicts class 1.
X_train = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_train = [0, 0, 1, 1]
X_test = [[0, 0], [1, 1]]
smoothing = 1.0[0, 1]Training Phase: The training data shows a clear pattern: feature₁ is highly predictive of the class.
Class Distribution:
Feature Analysis:
Predictions:
The classifier correctly learns that the first feature is the key discriminator.
X_train = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [0, 1, 1], [1, 0, 1]]
y_train = [0, 0, 0, 1, 1, 1]
X_test = [[1, 0, 0], [0, 1, 1], [1, 1, 1]]
smoothing = 1.0[0, 1, 1]Training Phase: Balanced dataset with 3 samples per class.
Class 0 samples: [1,0,0], [0,1,0], [0,0,1] Class 1 samples: [1,1,0], [0,1,1], [1,0,1]
Key Pattern Analysis:
Predictions:
The classifier learns the aggregate feature patterns that distinguish classes.
Constraints