Re: Reg Association Rules
Date: September 20, 2013 06:10AM
For the memory usage, it depends on:
- the parameters that you use (minsup, minconf, minlift). The lower that you will set these parameters, the number of association rules will grow exponentially and you are more likely to run out of memory. Usually, it is recommended to start with a high value and then to lower these parameters.
- Note also that the characteristics of your dataset may be a part of the problem. For example, if you have a very dense datasets where all transactions (lines) are very long and almost exactly the same, there may be a huge amount of association rules and the algorithm may just not terminate or run out of memory.
- A gigabyte.. your dataset is pretty huge for this kind of algorithm. Perhaps that you could use some sampling or some filtering to make the dataset smaller or remove useless information. This would make the algorithm faster/use less memory.
- Another possibility if you are a programmer is to add some constraints to the algorithms such as not mining rules with more than X items. This would obviously reduce the search space greatly and make the algorithm faster / use less memory.
For the lift computation,
- can you provide a simple example that shows that the lift is incorrect?
- A reason why the lift may be incorrect is if the input format is not correct. For example, in the input format, an item is only allowed to appear once per line and on each line items should always be sorted according to a the same order. If these are not done, the algorithm may not calculate the measures correctly or generate all the rules.
- Another point is that I have fixed an error in the lift calculation before. If you are using SPMF before v0.92, there was a problem with the lift and you could update to the latest version.
Best,
Philippe
Edited 1 time(s). Last edit at 09/20/2013 06:11AM by webmasterphilfv.