0/318

00:00:00

Description

Editorial

AdaBoost Ensemble Classifier Training

HARD30 pts

Adaptive Boosting (AdaBoost) is one of the most influential ensemble learning algorithms in machine learning history. Introduced by Freund and Schapire in 1996, it revolutionized the field by demonstrating how a collection of "weak learners" — classifiers that perform only slightly better than random guessing — can be combined into a highly accurate "strong learner."

The Core Philosophy of Boosting

Unlike bagging algorithms (such as Random Forests) that train learners independently, boosting algorithms train learners sequentially, with each subsequent learner focusing on the mistakes made by its predecessors. AdaBoost achieves this through an elegant mechanism of adaptive sample weighting:

Initialize equal weights: Every training sample starts with equal importance
Train weak learner: Fit a simple classifier (typically a decision stump — a single-split decision tree) that minimizes weighted classification error
Compute classifier weight: Assign higher influence (alpha) to more accurate weak learners
Update sample weights: Increase weights of misclassified samples so the next learner focuses on harder cases
Repeat: Build successive learners, each correcting the ensemble's current weaknesses

Decision Stumps: The Weak Learner

In this implementation, we use decision stumps as weak learners. A decision stump makes predictions based on a single feature threshold:

$$h(x) = \begin{cases} +1 & \text{if } p \cdot x_j > p \cdot \theta \ -1 & \text{otherwise} \end{cases}$$

Where:

$x_j$ is the j-th feature of input x
$\theta$ (theta) is the decision threshold
$p$ is the polarity (+1 or -1), which can flip the prediction direction

Weight Update Mechanism

After training each weak classifier, sample weights are updated using:

$$w_i^{(t+1)} = w_i^{(t)} \cdot \exp(-\alpha_t \cdot y_i \cdot h_t(x_i))$$

Where $\alpha_t$ is the classifier weight computed as:

$$\alpha_t = \frac{1}{2} \ln\left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$$

Here, $\epsilon_t$ is the weighted error rate of the t-th weak learner. Note that when $\epsilon_t$ is very small (near-perfect classification), $\alpha_t$ becomes large, giving that classifier more voting power in the final ensemble.

Your Task

Implement the adaboost_train function that trains an AdaBoost ensemble classifier. Your function should:

Initialize sample weights uniformly as $w_i = 1/n$ for each sample
For each weak classifier:
- Find the optimal decision stump by testing all features and threshold values
- Select the stump that minimizes the weighted classification error
- Compute the classifier's voting weight (alpha) based on its accuracy
- Update sample weights to emphasize misclassified samples
- Normalize the weights so they sum to 1
Return a list of classifier dictionaries containing: polarity, threshold, feature_index, and alpha for each weak learner

Important Implementation Details:

Use unique sorted feature values as candidate thresholds, considering the midpoints between consecutive values
Handle the edge case where a classifier achieves zero error (avoid division by zero in alpha calculation)
Round alpha values to 4 decimal places for output consistency

Example

Input

X = np.array([[1.0, 2.0], [2.0, 3.0], [3.0, 4.0], [4.0, 5.0]])
y = np.array([1, 1, -1, -1])
n_clf = 3

Output

[
  {"polarity": -1, "threshold": 2.5, "feature_index": 0, "alpha": 11.5129},
  {"polarity": -1, "threshold": 2.5, "feature_index": 0, "alpha": 11.5129},
  {"polarity": -1, "threshold": 2.5, "feature_index": 0, "alpha": 11.5129}
]

Explanation

The dataset has 4 samples with 2 features each. Samples at positions 0 and 1 belong to class +1 (feature 0 values: 1.0 and 2.0), while samples at positions 2 and 3 belong to class -1 (feature 0 values: 3.0 and 4.0).

Optimal Decision Stump: The best threshold is 2.5 on feature 0 with polarity -1. This means:

If -1 × x[0] > -1 × 2.5 (i.e., x[0] < 2.5), predict +1
Otherwise, predict -1

This stump perfectly separates the two classes, resulting in zero classification error. When error approaches zero, alpha becomes very large (11.5129 ≈ 0.5 × ln(1/ε) with ε near machine epsilon).

Since this stump achieves perfect classification, all three weak learners converge to the same optimal configuration.

Example

Input

X = np.array([[1.0, 1.0], [2.0, 2.0], [3.0, 3.0], [10.0, 10.0], [11.0, 11.0], [12.0, 12.0]])
y = np.array([1, 1, 1, -1, -1, -1])
n_clf = 2

Output

[
  {"polarity": -1, "threshold": 6.5, "feature_index": 0, "alpha": 11.5129},
  {"polarity": -1, "threshold": 6.5, "feature_index": 0, "alpha": 11.5129}
]

Explanation

This dataset has two clearly separated clusters: class +1 samples have low feature values (1-3), while class -1 samples have high values (10-12).

Optimal Threshold Selection: The midpoint between the closest samples of different classes is (3 + 10) / 2 = 6.5. With polarity -1 and threshold 6.5 on feature 0:

Samples with x[0] < 6.5 are classified as +1 ✓
Samples with x[0] ≥ 6.5 are classified as -1 ✓

Both weak learners identify this same perfect split, achieving zero error and maximum alpha values.

Example

Input

X = np.array([[1.0], [2.0], [3.0], [7.0], [8.0], [9.0]])
y = np.array([1, 1, 1, -1, -1, -1])
n_clf = 2

Output

[
  {"polarity": -1, "threshold": 5.0, "feature_index": 0, "alpha": 11.5129},
  {"polarity": -1, "threshold": 5.0, "feature_index": 0, "alpha": 11.5129}
]

Explanation

With a single feature and clear class separation, the optimal threshold lies at the midpoint of the gap: (3 + 7) / 2 = 5.0.

Using polarity -1 (predictions are flipped relative to the threshold comparison):

x[0] < 5.0: samples [1, 2, 3] → predict +1 (correct for class +1)
x[0] ≥ 5.0: samples [7, 8, 9] → predict -1 (correct for class -1)

The single-feature case demonstrates how even simple threshold-based rules can achieve perfect separation when data is linearly separable.

Accepted0/0·0% Acceptance

Constraints

2 ≤ n_samples ≤ 1000 (number of training samples)
1 ≤ n_features ≤ 50 (dimensionality of feature space)
1 ≤ n_clf ≤ 100 (number of weak classifiers to train)
y[i] ∈ {-1, +1} (binary classification labels)
-10⁶ ≤ X[i][j] ≤ 10⁶ (feature values)
Both classes must be present in the training set
Feature values are real numbers (floating-point)

Code

Visualizer

Solutions

14px

Test Cases3

Results

Submissions

X =

[[1,2],[2,3],[3,4],[4,5]]

y =

[1,1,-1,-1]

n_clf =

The Core Philosophy of Boosting

Initialize equal weights: Every training sample starts with equal importance

Train weak learner: Fit a simple classifier (typically a decision stump — a single-split decision tree) that minimizes weighted classification error

Compute classifier weight: Assign higher influence (alpha) to more accurate weak learners

Update sample weights: Increase weights of misclassified samples so the next learner focuses on harder cases

Repeat: Build successive learners, each correcting the ensemble's current weaknesses

Decision Stumps: The Weak Learner

In this implementation, we use decision stumps as weak learners. A decision stump makes predictions based on a single feature threshold:

$$h(x) = \begin{cases} +1 & \text{if } p \cdot x_j > p \cdot \theta \ -1 & \text{otherwise} \end{cases}$$

Where:

$x_j$ is the j-th feature of input x

$\theta$ (theta) is the decision threshold

$p$ is the polarity (+1 or -1), which can flip the prediction direction

Weight Update Mechanism

After training each weak classifier, sample weights are updated using:

$$w_i^{(t+1)} = w_i^{(t)} \cdot \exp(-\alpha_t \cdot y_i \cdot h_t(x_i))$$

Where $\alpha_t$ is the classifier weight computed as:

$$\alpha_t = \frac{1}{2} \ln\left(\frac{1 - \epsilon_t}{\epsilon_t}\right)$$

Your Task

Implement the adaboost_train function that trains an AdaBoost ensemble classifier. Your function should:

Initialize sample weights uniformly as $w_i = 1/n$ for each sample

For each weak classifier:

Find the optimal decision stump by testing all features and threshold values
Select the stump that minimizes the weighted classification error
Compute the classifier's voting weight (alpha) based on its accuracy
Update sample weights to emphasize misclassified samples
Normalize the weights so they sum to 1

Return a list of classifier dictionaries containing: polarity, threshold, feature_index, and alpha for each weak learner

Important Implementation Details:

Use unique sorted feature values as candidate thresholds, considering the midpoints between consecutive values

Handle the edge case where a classifier achieves zero error (avoid division by zero in alpha calculation)

Round alpha values to 4 decimal places for output consistency

AdaBoost Ensemble Classifier Training

The Core Philosophy of Boosting

Decision Stumps: The Weak Learner

Weight Update Mechanism

Your Task

Hints

AdaBoost Ensemble Classifier Training

The Core Philosophy of Boosting

Decision Stumps: The Weak Learner

Weight Update Mechanism

Your Task

Hints