The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Class association rule induction
Posted by: Hoa Vu
Date: February 20, 2014 11:02PM

Hi,

I have a problem about CARs(Class association rules Liu et al. 1998) that I am just interested in only rules with some fixed target items on the right-hand side. And I have very very large data set (55k transactions, 400k items). That makes Apriori cannot handle it efficiently because it must generate all the frequent item sets, association rules then filter them to take interesting rules.

Do you actually have any function in your nice implementation that fit my need or Could you please suggest me what should I do in this case? . I did try to find them but I did not found it.

Thank you very much!

Best,
Hoa Vu

Options: ReplyQuote
Re: Class association rule induction
Date: February 21, 2014 03:55AM

Hello Hoan,

There is no algorithm in SPMF that provides this kind of features but it would be easy to add it.

I will explain what you would need to do.

First, as you probably know mining association rules is done in two steps. The first step is to discover frequent itemsets. The second step is to generate association rules by using frequent itemsets.

For the first step, I would recommend to use use FPGrowth instead of Apriori because it is usually much faster than Apriori.

It would be possible to modify FPGrowth so that only itemsets containing the target item are output.

Also, you could add the constraint that itemsets should be smaller than a maximum size. This would help to reduce the number of patterns found. For example, if you only want rules with less than 4 items, then you don't need to discover frequent itemsets with more than 4 items.

Then for the second step, you would need to modify the algorithm for generating rules so that only rule with a single item on the right side are generated. I don't remember all the details of the algorithm for generating rules but it should probably not be too hard to do.

Overall, the easiest modifications is to add a maximum size for your association rules by modifying FPGrowth. If I were you, I would try this modification first. It would most likelygreatly reduce the number of patterns found.

Philippe

Options: ReplyQuote
Re: Class association rule induction
Posted by: Hoa Vu
Date: February 22, 2014 11:07PM

Hi Philippe,

Thank you very much for your reply. These 2 steps you suggest would definitly fit my needs. However when I working on that, I found another issues. That is, the itemsets I am interested in are very rare in the dataset (~200/55k transactions)!

If I lower the minsup to get these rules, it still costs me alot, even FP-Growth. When I set minsup = 1% (mincount = 550 > 200), the output contains 6855736 frequent itemsets. So I think would try another way:

step 1: Divide the dataset in two part, one all the transactions contains interested itemsets (D1), the other does not (D2). Then find all the association rule (X -> Y) in D1 that fit my need.

Step 2: Calculate the frequency of X in D2 to calculate the confidence of rule (X -> Y). There would be not so many X.

==> Now my problem becomes finding frequency of a particular itemset in a very large dataset. If I am not wrong your function called "algorithms for building, updating and querying an Itemset-Tree" would solve this problem?

Thank you very much!
Hoa Vu

Options: ReplyQuote
Re: Class association rule induction
Date: February 23, 2014 03:19AM

Hello,

I did not think about the Itemset Tree. But it is true that this algorithm allows to find all association rules that have an itemset Y in the right side of the rule.

For example, you can build an Itemset Tree by using your transactions.

Then, you can perform a query to find all rules of the form ? --> Y to find all rules with Y on the right side.

It would be easy to try this by looking at the examples for the itemset tree.

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.