Loading problem...
A classification tree is a powerful hierarchical structure used in machine learning to make predictions by recursively partitioning data based on feature values. At each internal node, the tree selects the most informative attribute to split the data, creating branches that lead to either further decision nodes or terminal leaf nodes containing class predictions.
The construction of an optimal classification tree relies on information theory concepts, specifically entropy and information gain. Entropy measures the impurity or uncertainty in a dataset—a set where all examples belong to the same class has zero entropy (perfectly pure), while a set with equally distributed classes has maximum entropy (maximum uncertainty).
Entropy Formula: For a set S with class proportions p₁, p₂, ..., pₖ:
$$H(S) = -\sum_{i=1}^{k} p_i \log_2(p_i)$$
Note: When p = 0, the term is defined as 0.
Information Gain: The information gain of splitting on attribute A measures the reduction in entropy:
$$IG(S, A) = H(S) - \sum_{v \in Values(A)} \frac{|S_v|}{|S|} H(S_v)$$
where Sᵥ is the subset of S for which attribute A has value v.
Tree Construction Algorithm:
Your Task: Implement a function that constructs a classification tree from a dataset. The function should:
examples = [
{'Outlook': 'Sunny', 'Wind': 'Weak', 'PlayTennis': 'No'},
{'Outlook': 'Overcast', 'Wind': 'Strong', 'PlayTennis': 'Yes'},
{'Outlook': 'Rain', 'Wind': 'Weak', 'PlayTennis': 'Yes'},
{'Outlook': 'Sunny', 'Wind': 'Strong', 'PlayTennis': 'No'},
{'Outlook': 'Sunny', 'Wind': 'Weak', 'PlayTennis': 'Yes'},
{'Outlook': 'Overcast', 'Wind': 'Weak', 'PlayTennis': 'Yes'},
{'Outlook': 'Rain', 'Wind': 'Strong', 'PlayTennis': 'No'},
{'Outlook': 'Rain', 'Wind': 'Weak', 'PlayTennis': 'Yes'}
]
attributes = ['Outlook', 'Wind']
target_attr = 'PlayTennis'{
'Outlook': {
'Overcast': 'Yes',
'Rain': {'Wind': {'Strong': 'No', 'Weak': 'Yes'}},
'Sunny': {'Wind': {'Strong': 'No', 'Weak': 'No'}}
}
}Step-by-step construction:
Calculate initial entropy: The target 'PlayTennis' has 5 'Yes' and 3 'No' → H(S) = -5/8·log₂(5/8) - 3/8·log₂(3/8) ≈ 0.954
Calculate information gain for each attribute:
'Outlook' is selected as the root (highest gain)
Branch 'Overcast': All 2 examples have 'Yes' → Pure leaf node 'Yes'
Branch 'Sunny': 2 'No' and 1 'Yes' → Not pure, split on 'Wind'
Branch 'Rain': 2 'Yes' and 1 'No' → Not pure, split on 'Wind'
examples = [
{'Color': 'Red', 'Size': 'Big', 'Label': 'Positive'},
{'Color': 'Red', 'Size': 'Small', 'Label': 'Positive'},
{'Color': 'Blue', 'Size': 'Big', 'Label': 'Negative'},
{'Color': 'Blue', 'Size': 'Small', 'Label': 'Negative'}
]
attributes = ['Color', 'Size']
target_attr = 'Label'{
'Color': {
'Blue': 'Negative',
'Red': 'Positive'
}
}Analysis:
Initial entropy: 2 'Positive', 2 'Negative' → H(S) = -1/2·log₂(1/2) - 1/2·log₂(1/2) = 1.0 (maximum entropy)
Information gain calculation:
'Color' selected as root (IG = 1.0)
Both branches lead to pure subsets:
The tree achieves perfect classification with just one split because 'Color' perfectly separates the classes.
examples = [
{'Feature': 'A', 'Class': 'Yes'},
{'Feature': 'B', 'Class': 'Yes'},
{'Feature': 'C', 'Class': 'Yes'}
]
attributes = ['Feature']
target_attr = 'Class''Yes'Base case activation:
Since all 3 examples belong to the same class 'Yes', the dataset is already pure (entropy = 0).
The algorithm immediately returns the class label 'Yes' as a leaf node without any splitting.
This demonstrates the first termination condition: when all examples share the same target value, return that value directly without creating a decision node.
Constraints