Re: Data format for Sequential Patterns with time-series
Date: May 16, 2012 07:15PM
Hello Yogi,
Welcome to the forum.
Sequential pattern mining like GSP take as input : (1) a sequence database and (2) a parameter called "minsup".
A sequence database is a set of sequences.
The goal of sequential pattern mining is to find subsequences that are common to several sequences.
In your case, I think that you have only a single sequence.
I will give you an example of a sequence database.
For example, here are three sequences named s1, s2 and s3:
s1: (a), (b, c), (d), (e)
s2: (b), (d), (e), (f)
s3: (b), (d), (e), (f)
The first sequence means that item "a" occured, and was followed by "b" and "c" at the same time, then followed by "d" and then by "e"
The second sequence means that "b" was followed by "d", followed by "e" followed by "f.
The third...
So if you have a sequence database, you can find some subsequences that are common to several sequences with algorithms like GSP and PrefixSpan.
For example, those algorithms could find that the subsequence (b), (e) appears in s1, s2 and s3.
Another example is that (e), (f) appears in s2 and s3.
So I just write this quickly to give you an idea of what those algorithms do.
If you have only a single sequence these algorithm may not be appropriate unless you can divide your sequence in several sequences and that it make sense for your application to do that.
Best,
Philippe
Edited 1 time(s). Last edit at 05/17/2012 08:40AM by webmasterphilfv.