Data format for Sequential Patterns with time-series

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Data format for Sequential Patterns with time-series

Posted by: YougyZ

Date: May 16, 2012 01:36AM

Hello all,

I am newbie in the sequential patterns mining. Right now, i am doing my internship and i will implement this method.

I need help for the datasets that will fit for the sequential pattern with time-series.

I have prepared an example of datasets like below :

Time M_T_ambiante DeltaT M_T_Ext M_Rayonnement
20/06/2011 00:00 Medium Normal Medium Low
20/06/2011 00:05 Medium Normal Medium Low
20/06/2011 00:10 Medium Normal Medium Low
20/06/2011 00:15 Medium Normal Medium Low
20/06/2011 00:20 Medium Normal Medium Low
20/06/2011 00:25 Medium Normal Medium Low
20/06/2011 00:30 Medium Normal Medium Low

Is this datasets already can be used with sequential patterns like GSP or else?

Thanks before your help.

Regards,
Yogi

Options: Reply•Quote

Re: Data format for Sequential Patterns with time-series

Posted by: webmasterphilfv

Date: May 16, 2012 07:15PM

Hello Yogi,

Welcome to the forum.

Sequential pattern mining like GSP take as input : (1) a sequence database and (2) a parameter called "minsup".

A sequence database is a set of sequences.

The goal of sequential pattern mining is to find subsequences that are common to several sequences.

In your case, I think that you have only a single sequence.

I will give you an example of a sequence database.

For example, here are three sequences named s1, s2 and s3:

s1: (a), (b, c), (d), (e)
s2: (b), (d), (e), (f)
s3: (b), (d), (e), (f)

The first sequence means that item "a" occured, and was followed by "b" and "c" at the same time, then followed by "d" and then by "e"

The second sequence means that "b" was followed by "d", followed by "e" followed by "f.

The third...

So if you have a sequence database, you can find some subsequences that are common to several sequences with algorithms like GSP and PrefixSpan.

For example, those algorithms could find that the subsequence (b), (e) appears in s1, s2 and s3.

Another example is that (e), (f) appears in s2 and s3.

So I just write this quickly to give you an idea of what those algorithms do.

If you have only a single sequence these algorithm may not be appropriate unless you can divide your sequence in several sequences and that it make sense for your application to do that.

Best,

Philippe

Edited 1 time(s). Last edit at 05/17/2012 08:40AM by webmasterphilfv.

Options: Reply•Quote

Re: Data format for Sequential Patterns with time-series

Posted by: webmasterphilfv

Date: May 16, 2012 07:18PM

By the way, there is some algorithms that are specialised for a single sequence like WINEPI or MINEPI for what they call "episode mining".

But this is a different problem than sequential pattern mining

Options: Reply•Quote

Re: Data format for Sequential Patterns with time-series

Posted by: YougyZ

Date: May 18, 2012 03:49AM

Thanks Philippe for your reponse, thats help me a lot.

I think the episode mining is interesting too, because what i want to do is to use the pattern that have been found with sequential pattern mining for do the anomaly detection in my sensors network.

Well, i think i must try to modify the datasets to fit the sequentials patterns mining, or using the episode mining.

Btw, thanks again..

Options: Reply•Quote