Re: association rules in the FHM algo
Date: January 01, 2018 03:44AM
Hello,
Thanks for using SPMF.
The traditional algorithm for generating association rules from itemset is implemented in SPMF. It is called AlgoAgrawalFaster94 in the source code, and is based on the paper of Agrawal for the Apriori algorithm. However, this code is designed to be applied with frequent itemsets instead of high utility itemsets.
One could think that it is easy to apply the same algorithm to high utility itemsets to generate high utility association rules. Actually, it could be. But there is still a challenge. I will explain.
In frequent itemset mining, if an itemset {a,b,c} is frequent, then all its subsets are also frequent. This is a very useful property for generating the rules. Actually, the AlgoAgrawal will takes pairs of itemsets such as X ={a,b,c} and Y = {b,c} to generate a rule {a} --> {b,c}, if i remember well. But the key point is that if {a,b,c} is frequent then we know that also {a} and {b,c} are frequent, so it is easy to calculate the confidence and lift of the rule {a} --> {b,c}.
In high utility itemset, on the other hand, if an itemset {a,b,c} is a high utility itemset, its subsets may or may not be high utility itemset. This cause some problems because if you try to combine two high utility itemsets X ={a,b,c} and Y = {b,c} to generate a rule {a} --> {b,c}, then maybe that the itemsets {a} and {b,c} are not high utility itemsets. So in that case, you don't have the information required to calculate the lift and confidence.
So I think that the key problem is that. How to solve this problem?
- A solution could be to utilize FHMFreq which is a version of FHM that offers both the utility and support constraint. You could use the support high, and set minutil = 0. This will generate many patterns that may have a low utility but at least you will get the support values required for generating the association rules. Then you could try to apply the AlgoAgrawal94 from there (but it would requires some coding).
- Or if the algorithm don't have the information about an itemset {a} but it is needed to generate a rule, you could scan the database again to get the information. This may be slow. But it could be a solution. If you do like that you would change the "CaculateSupport()" method in AlgoAgrwal
- Or...
By the way, the implementation of FHM in SPMF always save the result to a file. If you want to use these results to generate rules, you could also want to modify FHM so that it keeps the results about high utility itemsets in memory to then generate the rules, instead of saving the results to a file.
Best regards,