Loading problem...
In data mining and knowledge discovery, one of the most powerful techniques for understanding large datasets is frequent pattern mining. This approach identifies item combinations that appear together with notable regularity across a collection of records—revealing hidden associations and co-occurrence patterns that drive business insights.
Given a dataset of transactions (where each transaction is a collection of items) and a minimum support threshold (expressed as a fraction between 0 and 1), your task is to discover all frequent itemsets—combinations of items whose occurrence frequency meets or exceeds the specified threshold.
Understanding Support: The support of an itemset is the proportion of transactions in which that itemset appears. For example, if an itemset {apple, banana} appears in 6 out of 10 transactions, its support is 0.6 (or 60%).
$$\text{support}(X) = \frac{|{T \in D : X \subseteq T}|}{|D|}$$
Where:
The Levelwise Discovery Approach: The algorithm operates by leveraging the downward closure property (also called the anti-monotone property): if an itemset is infrequent, then all of its supersets must also be infrequent. This allows efficient pruning of the search space by:
Your Task: Implement a function that discovers all frequent itemsets in a transaction database. The function should:
transactions = [
["bread", "milk"],
["beer", "bread", "diaper", "eggs"],
["beer", "cola", "diaper", "milk"],
["beer", "bread", "diaper", "milk"],
["bread", "cola", "diaper", "milk"]
]
min_support = 0.6{
frozenset({'bread'}): 0.8,
frozenset({'beer'}): 0.6,
frozenset({'milk'}): 0.8,
frozenset({'diaper'}): 0.8,
frozenset({'beer', 'diaper'}): 0.6,
frozenset({'bread', 'milk'}): 0.6,
frozenset({'diaper', 'milk'}): 0.6,
frozenset({'bread', 'diaper'}): 0.6
}With 5 transactions and min_support = 0.6 (meaning an itemset must appear in at least 3 transactions):
1-itemsets: • 'bread' appears in transactions 1, 2, 4, 5 → support = 4/5 = 0.8 ✓ • 'beer' appears in transactions 2, 3, 4 → support = 3/5 = 0.6 ✓ • 'milk' appears in transactions 1, 3, 4, 5 → support = 4/5 = 0.8 ✓ • 'diaper' appears in transactions 2, 3, 4, 5 → support = 4/5 = 0.8 ✓ • 'eggs' appears only in transaction 2 → support = 1/5 = 0.2 ✗ • 'cola' appears in transactions 3, 5 → support = 2/5 = 0.4 ✗
2-itemsets (formed from frequent 1-itemsets): • {beer, diaper} appears in transactions 2, 3, 4 → support = 0.6 ✓ • {bread, milk} appears in transactions 1, 4, 5 → support = 0.6 ✓ • {diaper, milk} appears in transactions 3, 4, 5 → support = 0.6 ✓ • {bread, diaper} appears in transactions 2, 4, 5 → support = 0.6 ✓ • Other 2-itemset combinations have support < 0.6
3-itemsets: No 3-itemset achieves support ≥ 0.6, so the algorithm terminates.
transactions = [
["apple", "banana"],
["apple", "orange"],
["apple", "grape"],
["banana", "orange"]
]
min_support = 0.5{
frozenset({'banana'}): 0.5,
frozenset({'apple'}): 0.75,
frozenset({'orange'}): 0.5
}With 4 transactions and min_support = 0.5 (itemset must appear in at least 2 transactions):
1-itemsets: • 'apple' appears in transactions 1, 2, 3 → support = 3/4 = 0.75 ✓ • 'banana' appears in transactions 1, 4 → support = 2/4 = 0.5 ✓ • 'orange' appears in transactions 2, 4 → support = 2/4 = 0.5 ✓ • 'grape' appears only in transaction 3 → support = 1/4 = 0.25 ✗
2-itemsets: • {apple, banana} appears only in transaction 1 → support = 0.25 ✗ • {apple, orange} appears only in transaction 2 → support = 0.25 ✗ • {banana, orange} appears only in transaction 4 → support = 0.25 ✗
No 2-itemsets or higher meet the threshold, so only 1-itemsets are returned.
transactions = [
["bread", "eggs", "milk"],
["bread", "butter", "milk"],
["bread", "cheese"],
["eggs", "milk"],
["bread", "milk"]
]
min_support = 0.8{
frozenset({'bread'}): 0.8,
frozenset({'milk'}): 0.8
}With 5 transactions and a high min_support = 0.8 (itemset must appear in at least 4 transactions):
1-itemsets: • 'bread' appears in transactions 1, 2, 3, 5 → support = 4/5 = 0.8 ✓ • 'milk' appears in transactions 1, 2, 4, 5 → support = 4/5 = 0.8 ✓ • 'eggs' appears in transactions 1, 4 → support = 2/5 = 0.4 ✗ • 'butter' appears only in transaction 2 → support = 0.2 ✗ • 'cheese' appears only in transaction 3 → support = 0.2 ✗
2-itemsets: • {bread, milk} appears in transactions 1, 2, 5 → support = 3/5 = 0.6 ✗
The high threshold eliminates all multi-item combinations, leaving only the two most common individual items.
Constraints