The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
ItemSetTree: Performance degradation
Posted by: Andreas
Date: February 08, 2015 05:58AM

Hi,

I'm using the latest version of ItemSetTree to mine some rules. I'adding transactions incrementally.

If I have around 30 possible transaction IDs and 80 examples, the mining process takes less than a second. If I have around 45 the process starts taking > 30 seconds (on a 2012 Macbook Air).

Any hints what's going on here?

Many thanks,

Andreas

Options: ReplyQuote
Re: ItemSetTree: Performance degradation
Date: February 08, 2015 02:57PM

I have used itemsets trees sometimes with more than 300,000 transactions. Actually, performance always depends on the data. In itemset mining or frequent pattern mining in general, the complexity of discovering patterns is function of the number of patterns in your data, not on the size of your data.

For some datasets, the task become very computationnally expensive because there are too many patterns, even if the number of transactions is small. The number of patterns depends on (1) how similar your transactions are (how dense your data is), (2) how long your transactions are, (4) how many items you have etc. For example, you may have only two transactions:

1 2 3 4 5 6 7 8 9 10 11 ..., 19 20
1 2 3 4 5 6 7 8 9 10 11 ..., 19 20

But since they are the same, there will be 2^20 = 1048576 frequent itemsets in only these two transactions. So it really depends on your data.

Moreover, it also depends a lot on how you set the thresholds (e.g. minsup) for the mining task. For example, if you set minsup =0, there can be sometimes hunder of millions of itemsets and the algorithm will just never terminate or it may even fill your hard-drive with patterns ;-) So choosing appropriate values for the parameters is also important. In general, if the parameters are set higher, you will find less patterns and it will be faster. Moreover, in general, as you lower down minsup, the number of patterns may increases exponentially at some point as well as execution time.


If you want more comments, you may send your data to my e-mail with the parameters that you have used : philippe.fv AT gmail.com



Edited 4 time(s). Last edit at 02/08/2015 03:02PM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.