is a sequence database
, where each line is a sequence.
If all items appear at the same time in each sequence, TRuleGrowth will generate no result because it is looking for rules of the form X ==> Y, where X and Y are groups of items such that X happen before Y. TRuleGrowth try to find rules that are common to several sequences.
If all the items appear simultaneously in each line of your dataset, then it is not a sequence database. You could consider it as a transaction database
(where there is no time) and apply an association rule mining algorithm
instead. An association rule mining algorithm will generate rules too, but there will be no time.
Also, another important thing is that all these algorithms generally assume that items that appear together are sorted by some order to use some pruning strategies. If the items are not sorted, the algorithms may not generate the correct result. Also, an item should not appear twice in the same itemset. So for example, you could sort all your lines according to the increasing order of items and remove items that appear twice to get that:
1 2 3 -1 -2
1 2 3 4 -1 -2
1 2 5 6 -1 -2
2 3 -1 -2
Now if you want to apply an association rule mining algorithm in SPMF
, you would need to also remove the -1 and -2, like that:
1 2 3
1 2 3 4
1 2 5 6
Hope this helps,
Edited 2 time(s). Last edit at 04/28/2013 06:40AM by webmasterphilfv.