Re: ItemSetTree: Performance degradation
Date: February 08, 2015 02:57PM
I have used itemsets trees sometimes with more than 300,000 transactions. Actually, performance always depends on the data. In itemset mining or frequent pattern mining in general, the complexity of discovering patterns is function of the number of patterns in your data, not on the size of your data.
For some datasets, the task become very computationnally expensive because there are too many patterns, even if the number of transactions is small. The number of patterns depends on (1) how similar your transactions are (how dense your data is), (2) how long your transactions are, (4) how many items you have etc. For example, you may have only two transactions:
1 2 3 4 5 6 7 8 9 10 11 ..., 19 20
1 2 3 4 5 6 7 8 9 10 11 ..., 19 20
But since they are the same, there will be 2^20 = 1048576 frequent itemsets in only these two transactions. So it really depends on your data.
Moreover, it also depends a lot on how you set the thresholds (e.g. minsup) for the mining task. For example, if you set minsup =0, there can be sometimes hunder of millions of itemsets and the algorithm will just never terminate or it may even fill your hard-drive with patterns ;-) So choosing appropriate values for the parameters is also important. In general, if the parameters are set higher, you will find less patterns and it will be faster. Moreover, in general, as you lower down minsup, the number of patterns may increases exponentially at some point as well as execution time.
If you want more comments, you may send your data to my e-mail with the parameters that you have used : philippe.fv AT gmail.com
Edited 4 time(s). Last edit at 02/08/2015 03:02PM by webmasterphilfv.