Loading problem...
A Regression Tree is a powerful supervised learning algorithm that predicts continuous numerical values by recursively partitioning the feature space into regions. Unlike classification trees that output discrete class labels, regression trees predict real-valued outputs by averaging the target values within each terminal node (leaf).
The algorithm builds a tree structure through recursive binary splitting. At each internal node, the algorithm selects the optimal feature and threshold value that best separates the data, creating two child partitions. This process continues until stopping criteria are met.
The quality of a split is evaluated using the Mean Squared Error (MSE) reduction. For a potential split that divides data into left and right partitions, the algorithm calculates:
$$\text{MSE}{\text{parent}} = \frac{1}{N} \sum{i=1}^{N} (y_i - \bar{y})^2$$
$$\text{MSE}{\text{split}} = \frac{N{\text{left}}}{N} \cdot \text{MSE}{\text{left}} + \frac{N{\text{right}}}{N} \cdot \text{MSE}_{\text{right}}$$
The optimal split maximizes:\ (\text{MSE}{\text{parent}} - \text{MSE}{\text{split}})
For each feature, candidate split thresholds are computed as the midpoint between consecutive unique sorted values of that feature. This ensures that splits occur at boundaries that actually separate data points.
When a node becomes a leaf (terminal node), its prediction value is simply the arithmetic mean of all target values in that partition:
$$\text{prediction} = \frac{1}{n} \sum_{i=1}^{n} y_i$$
The tree construction halts and creates a leaf node when any of the following conditions are met:
max_depthmin_samples_splitImplement a function that constructs a regression tree from training data and uses it to generate predictions for test samples. The function should:
X_train = [[0], [1], [2], [3]]
y_train = [0.0, 0.0, 1.0, 1.0]
X_test = [[0.5], [2.5]]
max_depth = 1
min_samples_split = 2[0.0, 1.0]The algorithm identifies the optimal split at threshold 1.5 (the midpoint between feature values 1 and 2). This partitions the data as follows:
• Left partition (feature ≤ 1.5): Contains samples with indices 0 and 1, where y = [0.0, 0.0]. The mean prediction is 0.0. • Right partition (feature > 1.5): Contains samples with indices 2 and 3, where y = [1.0, 1.0]. The mean prediction is 1.0.
For test predictions: • X_test[0] = 0.5 ≤ 1.5 → Routes to left partition → Predicts 0.0 • X_test[1] = 2.5 > 1.5 → Routes to right partition → Predicts 1.0
X_train = [[0, 0], [0, 1], [1, 0], [1, 1]]
y_train = [0.0, 1.0, 1.0, 2.0]
X_test = [[0.5, 0.5], [0, 0.5]]
max_depth = 2
min_samples_split = 2[0.0, 0.0]With two features, the algorithm evaluates splits on both dimensions. The tree structure depends on which feature provides the maximum MSE reduction at each node.
At depth 1, feature 0 (first column) with threshold 0.5 provides an effective split: • Left partition (feature 0 ≤ 0.5): y = [0.0, 1.0], mean = 0.5 • Right partition (feature 0 > 0.5): y = [1.0, 2.0], mean = 1.5
Further splits at depth 2 refine these partitions. Both test samples ultimately route to the partition containing y = 0.0, resulting in predictions of [0.0, 0.0].
X_train = [[0], [1], [2], [3], [4], [5]]
y_train = [0.0, 0.5, 1.0, 3.0, 3.5, 4.0]
X_test = [[0.5], [2.5], [4.5]]
max_depth = 3
min_samples_split = 2[0.0, 1.0, 3.5]With deeper tree depth, the algorithm creates more granular partitions:
• At the root, the best split separates low values (0.0, 0.5, 1.0) from high values (3.0, 3.5, 4.0). • Subsequent splits further partition these regions.
The final predictions emerge from navigating test samples through the tree: • X_test[0] = 0.5 → Routes to the leftmost region → Predicts 0.0 (observing the leaf with y = [0.0]) • X_test[1] = 2.5 → Routes to a middle region → Predicts 1.0 • X_test[2] = 4.5 → Routes to the rightmost region → Predicts 3.5
Constraints