Regression Tree Predictor (Medium) — Practice with Code Visualizer

A Regression Tree is a powerful supervised learning algorithm that predicts continuous numerical values by recursively partitioning the feature space into regions. Unlike classification trees that output discrete class labels, regression trees predict real-valued outputs by averaging the target values within each terminal node (leaf).

Core Concept: Recursive Binary Partitioning

The algorithm builds a tree structure through recursive binary splitting. At each internal node, the algorithm selects the optimal feature and threshold value that best separates the data, creating two child partitions. This process continues until stopping criteria are met.

Splitting Criterion: Mean Squared Error Reduction

The quality of a split is evaluated using the Mean Squared Error (MSE) reduction. For a potential split that divides data into left and right partitions, the algorithm calculates:

$$\text{MSE}{\text{parent}} = \frac{1}{N} \sum{i=1}^{N} (y_i - \bar{y})^2$$

$$\text{MSE}{\text{split}} = \frac{N{\text{left}}}{N} \cdot \text{MSE}{\text{left}} + \frac{N{\text{right}}}{N} \cdot \text{MSE}_{\text{right}}$$

The optimal split maximizes:\ (\text{MSE}{\text{parent}} - \text{MSE}{\text{split}})

Threshold Selection Strategy

For each feature, candidate split thresholds are computed as the midpoint between consecutive unique sorted values of that feature. This ensures that splits occur at boundaries that actually separate data points.

Leaf Node Prediction

When a node becomes a leaf (terminal node), its prediction value is simply the arithmetic mean of all target values in that partition:

$$\text{prediction} = \frac{1}{n} \sum_{i=1}^{n} y_i$$

Stopping Conditions

The tree construction halts and creates a leaf node when any of the following conditions are met:

The current depth equals max_depth
The number of samples in the node is less than min_samples_split
All target values in the node are identical (pure node)

Your Task

Implement a function that constructs a regression tree from training data and uses it to generate predictions for test samples. The function should:

Build the tree using MSE reduction as the splitting criterion
Use midpoints between consecutive unique feature values as threshold candidates
Predict the mean target value at each leaf node
Return predictions rounded to 4 decimal places

The algorithm identifies the optimal split at threshold 1.5 (the midpoint between feature values 1 and 2). This partitions the data as follows:

• Left partition (feature ≤ 1.5): Contains samples with indices 0 and 1, where y = [0.0, 0.0]. The mean prediction is 0.0. • Right partition (feature > 1.5): Contains samples with indices 2 and 3, where y = [1.0, 1.0]. The mean prediction is 1.0.

For test predictions: • X_test[0] = 0.5 ≤ 1.5 → Routes to left partition → Predicts 0.0 • X_test[1] = 2.5 > 1.5 → Routes to right partition → Predicts 1.0

With two features, the algorithm evaluates splits on both dimensions. The tree structure depends on which feature provides the maximum MSE reduction at each node.

At depth 1, feature 0 (first column) with threshold 0.5 provides an effective split: • Left partition (feature 0 ≤ 0.5): y = [0.0, 1.0], mean = 0.5 • Right partition (feature 0 > 0.5): y = [1.0, 2.0], mean = 1.5

Further splits at depth 2 refine these partitions. Both test samples ultimately route to the partition containing y = 0.0, resulting in predictions of [0.0, 0.0].

With deeper tree depth, the algorithm creates more granular partitions:

• At the root, the best split separates low values (0.0, 0.5, 1.0) from high values (3.0, 3.5, 4.0). • Subsequent splits further partition these regions.

The final predictions emerge from navigating test samples through the tree: • X_test[0] = 0.5 → Routes to the leftmost region → Predicts 0.0 (observing the leaf with y = [0.0]) • X_test[1] = 2.5 → Routes to a middle region → Predicts 1.0 • X_test[2] = 4.5 → Routes to the rightmost region → Predicts 3.5

Core Concept: Recursive Binary Partitioning

Splitting Criterion: Mean Squared Error Reduction

The quality of a split is evaluated using the Mean Squared Error (MSE) reduction. For a potential split that divides data into left and right partitions, the algorithm calculates:

$$\text{MSE}{\text{parent}} = \frac{1}{N} \sum{i=1}^{N} (y_i - \bar{y})^2$$

$$\text{MSE}{\text{split}} = \frac{N{\text{left}}}{N} \cdot \text{MSE}{\text{left}} + \frac{N{\text{right}}}{N} \cdot \text{MSE}_{\text{right}}$$

The optimal split maximizes:\ (\text{MSE}{\text{parent}} - \text{MSE}{\text{split}})

Threshold Selection Strategy

Leaf Node Prediction

When a node becomes a leaf (terminal node), its prediction value is simply the arithmetic mean of all target values in that partition:

$$\text{prediction} = \frac{1}{n} \sum_{i=1}^{n} y_i$$

Stopping Conditions

The tree construction halts and creates a leaf node when any of the following conditions are met:

The current depth equals max_depth
The number of samples in the node is less than min_samples_split
All target values in the node are identical (pure node)

Your Task

Implement a function that constructs a regression tree from training data and uses it to generate predictions for test samples. The function should:

Build the tree using MSE reduction as the splitting criterion
Use midpoints between consecutive unique feature values as threshold candidates
Predict the mean target value at each leaf node
Return predictions rounded to 4 decimal places

The algorithm identifies the optimal split at threshold 1.5 (the midpoint between feature values 1 and 2). This partitions the data as follows:

For test predictions: • X_test[0] = 0.5 ≤ 1.5 → Routes to left partition → Predicts 0.0 • X_test[1] = 2.5 > 1.5 → Routes to right partition → Predicts 1.0

With two features, the algorithm evaluates splits on both dimensions. The tree structure depends on which feature provides the maximum MSE reduction at each node.

Further splits at depth 2 refine these partitions. Both test samples ultimately route to the partition containing y = 0.0, resulting in predictions of [0.0, 0.0].

With deeper tree depth, the algorithm creates more granular partitions:

• At the root, the best split separates low values (0.0, 0.5, 1.0) from high values (3.0, 3.5, 4.0). • Subsequent splits further partition these regions.

Regression Tree Predictor

Core Concept: Recursive Binary Partitioning

Splitting Criterion: Mean Squared Error Reduction

Threshold Selection Strategy

Leaf Node Prediction

Stopping Conditions

Your Task

Hints

Regression Tree Predictor

Core Concept: Recursive Binary Partitioning

Splitting Criterion: Mean Squared Error Reduction

Threshold Selection Strategy

Leaf Node Prediction

Stopping Conditions

Your Task

Hints