Re: mine sequential patterns in repeated items itemsets
Date: September 22, 2014 01:49PM
Hi,
Welcome to the forum.
Yes, algorithms such as PrefixSpan, GSP, SPADE, etc. assumes that no items can appear twice in the same itemset.
If you want to apply algorithms such as PrefixSpan, GSP, SPADE, etc., on your data you would thus need to remove duplicate items in itemsets. For example:
S1 = ((1 1 1 2 2 1 2 1 2 1 2 ), (1 2 3 1 2 2 1), (1 3 1 2 2 1 1 1 2))
would become:
S1 = ((1 2 ), (1 2 3), (1 2 3))
Or another way may be to recode your dataset differently such as:
S1 = (1) (1) (1) (2) (2) (1 2) (1 2) (1 2), (1 2) (1 3) (2) (2) ...
It don't know if it would make sense for your application, though.
Another possibility is to look at "quantitative sequential pattern mining algorithms" such as SQUIRE ( http://www.cs.ubc.ca/~rng/psdepository/shim2007.pdf ), which is not offered in SPMF. This algorithms allows items to have quantities in itemsets. For example, the item 1 may appear with a quantity of 2 in an itemset.
Best,
Edited 2 time(s). Last edit at 09/22/2014 01:50PM by webmasterphilfv.