0/318

00:00:00

Description

Editorial

Binary Feature Probabilistic Classifier

MEDIUM15 pts

In machine learning, probabilistic classifiers leverage the principles of Bayesian inference to make predictions based on observed data. One powerful approach specifically designed for binary feature spaces models the probability that each feature takes a value of 0 or 1 given the class label.

This classifier is particularly effective when dealing with presence/absence data, such as whether a word appears in a document, whether a user has performed a specific action, or whether a sensor has detected an event. The core idea is to learn the probability distribution of features for each class during training, then apply Bayes' theorem to compute the posterior probability of each class given new observations.

Mathematical Foundation:

For a sample with binary features x = [x₁, x₂, ..., xₙ], the classifier computes the posterior probability for each class c as:

$$P(c | x) \propto P(c) \cdot \prod_{i=1}^{n} P(x_i | c)$$

Where:

P(c) is the class prior probability (the proportion of training samples belonging to class c)
P(xᵢ | c) is the feature likelihood (the probability of feature i taking value xᵢ given class c)

For binary features, the likelihood is modeled as: $$P(x_i = 1 | c) = \theta_{ic}$$ $$P(x_i = 0 | c) = 1 - \theta_{ic}$$

Laplace Smoothing:

To prevent zero probabilities when a feature value is never observed for a particular class in the training data, we apply Laplace smoothing (also known as additive smoothing):

$$\theta_{ic} = \frac{\text{count}(x_i = 1, y = c) + \alpha}{\text{count}(y = c) + 2\alpha}$$

Where α is the smoothing parameter (typically 1.0).

Numerical Stability:

To avoid underflow when multiplying many small probabilities, all computations should be performed in log-probability space:

$$\log P(c | x) = \log P(c) + \sum_{i=1}^{n} \log P(x_i | c)$$

Your Task:

Implement a Python class BinaryProbabilisticClassifier with the following methods:

forward(self, X, y): Train the model by computing class priors and feature probabilities from the training data
predict(self, X): Return predicted class labels (0 or 1) for the test samples

Additionally, implement a solve function that creates a classifier instance, trains it, and returns predictions.

Requirements:

Use only NumPy for all computations
Apply Laplace smoothing with the provided smoothing parameter
Use log-probabilities for numerical stability
Handle edge cases where training data contains only one class
Return predictions as binary values (0 or 1)

Example

Input

X_train = [[1, 0, 1], [1, 1, 0], [0, 0, 1], [0, 1, 0], [1, 1, 1]]
y_train = [1, 1, 0, 0, 1]
X_test = [[1, 0, 1]]
smoothing = 1.0

Output

[1]

Explanation

Training Phase: The model learns from 5 training samples with 3 binary features each.

Class Priors (with Laplace smoothing):

P(class=0) = (2 + 1) / (5 + 2) = 3/7 ≈ 0.429
P(class=1) = (3 + 1) / (5 + 2) = 4/7 ≈ 0.571

Feature Probabilities for Class 0 (samples: [0,0,1], [0,1,0]):

P(feature₁=1 | class=0) = (0 + 1) / (2 + 2) = 1/4 = 0.25
P(feature₂=1 | class=0) = (1 + 1) / (2 + 2) = 2/4 = 0.50
P(feature₃=1 | class=0) = (1 + 1) / (2 + 2) = 2/4 = 0.50

Feature Probabilities for Class 1 (samples: [1,0,1], [1,1,0], [1,1,1]):

P(feature₁=1 | class=1) = (3 + 1) / (3 + 2) = 4/5 = 0.80
P(feature₂=1 | class=1) = (2 + 1) / (3 + 2) = 3/5 = 0.60
P(feature₃=1 | class=1) = (2 + 1) / (3 + 2) = 3/5 = 0.60

Prediction for [1, 0, 1]:

Log P(class=0 | x) ∝ log(3/7) + log(0.25) + log(0.50) + log(0.50) ≈ -4.09
Log P(class=1 | x) ∝ log(4/7) + log(0.80) + log(0.40) + log(0.60) ≈ -2.22

Since -2.22 > -4.09, the model predicts class 1.

Example

Input

X_train = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_train = [0, 0, 1, 1]
X_test = [[0, 0], [1, 1]]
smoothing = 1.0

Output

[0, 1]

Explanation

Training Phase: The training data shows a clear pattern: feature₁ is highly predictive of the class.

Class Distribution:

Class 0: samples [0,0] and [0,1] → feature₁ is always 0
Class 1: samples [1,0] and [1,1] → feature₁ is always 1

Feature Analysis:

Feature₁ perfectly separates classes (0 → class 0, 1 → class 1)
Feature₂ has no discriminative power (appears equally in both classes)

Predictions:

For [0, 0]: Feature₁=0 strongly indicates class 0 → predicts 0
For [1, 1]: Feature₁=1 strongly indicates class 1 → predicts 1

The classifier correctly learns that the first feature is the key discriminator.

Example

Input

X_train = [[1, 0, 0], [0, 1, 0], [0, 0, 1], [1, 1, 0], [0, 1, 1], [1, 0, 1]]
y_train = [0, 0, 0, 1, 1, 1]
X_test = [[1, 0, 0], [0, 1, 1], [1, 1, 1]]
smoothing = 1.0

Output

[0, 1, 1]

Explanation

Training Phase: Balanced dataset with 3 samples per class.

Class 0 samples: [1,0,0], [0,1,0], [0,0,1] Class 1 samples: [1,1,0], [0,1,1], [1,0,1]

Key Pattern Analysis:

Feature combinations with multiple 1s tend toward class 1
Isolated single features tend toward class 0

Predictions:

[1,0,0]: Single feature pattern → class 0
[0,1,1]: Two features active → class 1
[1,1,1]: All features active, strong class 1 signal → class 1

The classifier learns the aggregate feature patterns that distinguish classes.

Accepted0/0·0% Acceptance

Constraints

1 ≤ number of training samples ≤ 10,000
1 ≤ number of features ≤ 1,000
1 ≤ number of test samples ≤ 10,000
All feature values are binary (0 or 1)
Class labels are binary (0 or 1)
0.1 ≤ smoothing parameter ≤ 10.0
Training data will contain at least one sample
All samples in X_train have the same number of features
All samples in X_test have the same number of features as X_train

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X_test =

[[1,0,1]]

X_train =

[[1,0,1],[1,1,0],[0,0,1],[0,1,0],[1,1,1]]

y_train =

[1,1,0,0,1]

smoothing =

Binary Feature Probabilistic Classifier

Hints

Binary Feature Probabilistic Classifier

Hints