Hi Fendi,
I will explain to you how to add the lift measure to SPMF.
It is not very complciated. The lift is calculated as
lift( X --> Y) = sup(X U Y) / (sup(x)*sup(y)). Because the confidence is
confidence( X --> Y) = sup(X U Y) / sup(x), we can calculate the lift as follows: lift(x -->Y) = confidence(X -->Y) / sup(y)
So I will use this formula to calculate the lift.
What we first need to do is to modify the class
RuleAgrawal in the package
ca.pfv.spmf.associationrules.agrawal_FPGrowth_version by adding this method to calculate the lift:
public double getLift(){
return (double)getConfidence() / itemset2.getAbsoluteSupport();
}
Then, we will modify the code from the method
printRules() in the class
RulesAgrawal so that the lift is shown in the result. You just need to add one line:
public void printRules(int objectsCount){
System.out.println(" ------- " + name + " -------"
int i=0;
for(RuleAgrawal rule : rules){
System.out.print(" rule " + i + ": " + rule.toString());
System.out.print("support : " + rule.getRelativeSupport(objectsCount) +
" (" + rule.getAbsoluteSupport() + "/" + objectsCount + " "
System.out.print("confidence : " + rule.getConfidence());
System.out.print(" lift : " + rule.getLift());
System.out.println(""
i++;
}
System.out.println(" --------------------------------"
}
After that, we will modify the class
AlgoFPGrowth in the package
ca.pfv.spmf.frequentpatterns.fpgrowth. We will add the method to sort the items in an itemset:
public void sortItemset(Itemset itemset){
Collections.sort(itemset.getItems(), new Comparator<Integer>() {
public int compare(Integer o1,Integer o2) {
return o1 - o2;
}
});
}
And we will add this line in the method
addAllCombinationsForPathAndPrefix of
AlgoFPGrowth::
private void addAllCombinationsForPathAndPrefix(FPNode node, Itemset prefix) {
// We add the node to the prefix
Itemset itemset = prefix.cloneItemset();
itemset.addItem(node.itemID);
itemset.setTransactioncount(node.counter);
sortItemset(itemset);
frequentItemsets.addItemset(itemset, itemset.size());
// recursive call if there is a node link
if(node.nodeLink != null){
addAllCombinationsForPathAndPrefix(node.nodeLink, prefix);
addAllCombinationsForPathAndPrefix(node.nodeLink, itemset);
}
}
And we will add this line in the method
fpgrowthMoreThanOnePath of
AlgoFPGrowth:
private void fpgrowthMoreThanOnePath(FPTree tree, Itemset prefixAlpha, Map<Integer, Integer> mapSupport) {
// We process each frequent item in the header table list of the tree in reverse order.
for(int i= tree.headerList.size()-1; i>=0; i--){
Integer item = tree.headerList.get(i);
int support = mapSupport.get(item);
// if the item is not frequent, we skip it
if(support < relativeMinsupp){
continue;
}
// Create Beta by concatening Alpha with the current item
// and add it to the list of frequent patterns
Itemset beta = prefixAlpha.cloneItemset();
beta.addItem(item);
if(prefixAlpha.getAbsoluteSupport() < support){
beta.setTransactioncount(prefixAlpha.getAbsoluteSupport());
}else{
beta.setTransactioncount(support);
}
sortItemset(beta);
frequentItemsets.addItemset(beta, beta.size());
// === Construct beta's conditional pattern base ===
// It is a subdatabase which consists of the set of prefix paths
// in the FP-tree co-occuring with the suffix pattern.
List<List<FPNode>> prefixPaths = new ArrayList<List<FPNode>>();
FPNode path = tree.mapItemNodes.get(item);
while(path != null){
// if the path is not just the root node
if(path.parent.itemID != -1){
// create the prefixpath
List<FPNode> prefixPath = new ArrayList<FPNode>();
// add this node.
prefixPath.add(path); // NOTE: we add it just to keep its support,
// actually it should not be part of the prefixPath
//Recursively add all the parents of this node.
FPNode parent = path.parent;
while(parent.itemID != -1){
prefixPath.add(parent);
parent = parent.parent;
}
prefixPaths.add(prefixPath);
}
// We will look for the next prefixpath
path = path.nodeLink;
}
// (A) Calculate the frequency of each item in the prefixpath
Map<Integer, Integer> mapSupportBeta = new HashMap<Integer, Integer>();
// for each prefixpath
for(List<FPNode> prefixPath : prefixPaths){
// the support of the prefixpath is the support of its first node.
int pathCount = prefixPath.get(0).counter;
for(int j=1; j<prefixPath.size(); j++){ // for each node, except the first one, we count the frequency
FPNode node = prefixPath.get(j);
if(mapSupportBeta.get(node.itemID) == null){
mapSupportBeta.put(node.itemID, pathCount);
}else{
mapSupportBeta.put(node.itemID, mapSupportBeta.get(node.itemID) + pathCount);
}
}
}
// ( Construct beta's conditional FP-Tree
FPTree treeBeta = new FPTree();
// add each prefixpath in the FP-tree
for(List<FPNode> prefixPath : prefixPaths){
treeBeta.addPrefixPath(prefixPath, mapSupportBeta, relativeMinsupp);
}
treeBeta.createHeaderList(mapSupportBeta);
// Mine recursively the Beta tree.
if(treeBeta.root.childs.size() > 0){
fpgrowth(treeBeta, beta, mapSupportBeta);
}
}
}
and lastly, we need to add this line in the method code of
AlgoAgrawalFaster94_FPGrowth_version:
private void apGenrules(int k, int m, Itemset lk, Set<Itemset> Hm) {
// System.out.println(" " + lk.toString() + " " + Hm.toString());
if(k > m+1){
Set<Itemset> Hm_plus_1 = generateCandidateSizeK(Hm);
Set<Itemset> Hm_plus_1_for_recursion = new HashSet<Itemset>();
for(Itemset hm_P_1 : Hm_plus_1){
Itemset itemset_Lk_minus_hm_P_1 = lk.cloneItemSetMinusAnItemset(hm_P_1);
calculateSupport(itemset_Lk_minus_hm_P_1); // THIS COULD BE DONE ANOTHER WAY ?
// IT COULD PERHAPS BE IMPROVED....
calculateSupport(hm_P_1); // if we want to calculate the lift, we need to add this.
// System.out.println(hm_P_1.getAbsoluteSupport());
double conf = ((double)lk.getAbsoluteSupport()) / ((double)itemset_Lk_minus_hm_P_1.getAbsoluteSupport());
if(conf >= minconf){
RuleAgrawal rule = new RuleAgrawal(itemset_Lk_minus_hm_P_1, hm_P_1, lk.getAbsoluteSupport(), conf);
rules.addRule(rule);
Hm_plus_1_for_recursion.add(hm_P_1);
}
}
apGenrules(k, m+1, lk, Hm_plus_1_for_recursion);
}
}
This latter line will calculate sup(Y) for each rule. For this to work, the itemset need to be sorted. This is why, i have added the sortItemset(..) method.
After that, if you run the test file
MainTestAllAssociationRules_FPGrowth_version, the lift will be shown in the console:
------- All association rules -------
rule 0: 5 ==> 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 1: 4 ==> 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 2: 1 ==> 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 3: 4 ==> 1 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 4: 2 ==> 4 support : 0.6666666666666666 (4/6) confidence : 0.6666666666666666 lift : 0.16666666666666666
rule 5: 4 ==> 2 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.16666666666666666
rule 6: 3 ==> 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 7: 5 ==> 3 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 8: 2 ==> 3 support : 0.6666666666666666 (4/6) confidence : 0.6666666666666666 lift : 0.16666666666666666
rule 9: 3 ==> 2 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.16666666666666666
rule 10: 1 ==> 5 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.2
rule 11: 5 ==> 1 support : 0.6666666666666666 (4/6) confidence : 0.8 lift : 0.2
rule 12: 1 ==> 2 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.16666666666666666
rule 13: 2 ==> 1 support : 0.6666666666666666 (4/6) confidence : 0.6666666666666666 lift : 0.16666666666666666
rule 14: 2 ==> 5 support : 0.8333333333333334 (5/6) confidence : 0.8333333333333334 lift : 0.16666666666666669
rule 15: 5 ==> 2 support : 0.8333333333333334 (5/6) confidence : 1.0 lift : 0.16666666666666666
rule 16: 2 5 ==> 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 17: 2 4 ==> 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 18: 4 5 ==> 2 support : 0.5 (3/6) confidence : 1.0 lift : 0.16666666666666666
rule 19: 4 ==> 2 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 20: 5 ==> 2 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 21: 1 5 ==> 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 22: 1 4 ==> 5 support : 0.5 (3/6) confidence : 1.0 lift : 0.2
rule 23: 4 5 ==> 1 support : 0.5 (3/6) confidence : 1.0 lift : 0.25
rule 24: 5 ==> 1 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.19999999999999998
rule 25: 1 ==> 4 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.25
rule 26: 4 ==> 1 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 27: 1 2 ==> 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 28: 1 4 ==> 2 support : 0.5 (3/6) confidence : 1.0 lift : 0.16666666666666666
rule 29: 2 4 ==> 1 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 30: 1 ==> 2 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 31: 4 ==> 1 2 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 32: 2 3 ==> 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 33: 2 5 ==> 3 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 34: 3 5 ==> 2 support : 0.5 (3/6) confidence : 1.0 lift : 0.16666666666666666
rule 35: 5 ==> 2 3 support : 0.5 (3/6) confidence : 0.6 lift : 0.15
rule 36: 3 ==> 2 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.15
rule 37: 1 2 ==> 5 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.2
rule 38: 1 5 ==> 2 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.16666666666666666
rule 39: 2 5 ==> 1 support : 0.6666666666666666 (4/6) confidence : 0.8 lift : 0.2
rule 40: 5 ==> 1 2 support : 0.6666666666666666 (4/6) confidence : 0.8 lift : 0.2
rule 41: 2 ==> 1 5 support : 0.6666666666666666 (4/6) confidence : 0.6666666666666666 lift : 0.16666666666666666
rule 42: 1 ==> 2 5 support : 0.6666666666666666 (4/6) confidence : 1.0 lift : 0.2
rule 43: 1 2 5 ==> 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 44: 1 2 4 ==> 5 support : 0.5 (3/6) confidence : 1.0 lift : 0.2
rule 45: 1 4 5 ==> 2 support : 0.5 (3/6) confidence : 1.0 lift : 0.16666666666666666
rule 46: 2 4 5 ==> 1 support : 0.5 (3/6) confidence : 1.0 lift : 0.25
rule 47: 2 4 ==> 1 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 48: 1 2 ==> 4 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.25
rule 49: 1 5 ==> 2 4 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 50: 1 4 ==> 2 5 support : 0.5 (3/6) confidence : 1.0 lift : 0.2
rule 51: 4 5 ==> 1 2 support : 0.5 (3/6) confidence : 1.0 lift : 0.25
rule 52: 2 5 ==> 1 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.19999999999999998
rule 53: 4 ==> 1 2 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.1875
rule 54: 1 ==> 2 4 5 support : 0.5 (3/6) confidence : 0.75 lift : 0.25
rule 55: 5 ==> 1 2 4 support : 0.5 (3/6) confidence : 0.6 lift : 0.19999999999999998
--------------------------------
For example, if you consider the rule
4 ==> 1 2 5, the lift is calculated as lift(4--> 1 2 5) = sup(4 1 2 5) / (sup(4)* sup( 1 2 5) = 3 / (4*4) = 0.1875.
If you want to add the lift to the version of FPGrowth that save the result to a file, the idea should be the same, except that you will have to modify files in the package "
ca.pfv.spmf.associationrules.agrawal_FPGrowth_version_saveToFile".
Hope this helps!
Philippe
Edited 7 time(s). Last edit at 07/04/2012 07:53AM by webmasterphilfv.