Re: Frequent subsequences for prediction

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Frequent subsequences for prediction

Posted by: Anonymous User

Date: June 19, 2012 12:52AM

Hi I'm confused about how to approach a problem regarding using sequences for prediction.
I have 200 different sequences of 14 letters, all sequences lenghts' are different ranging from 20 to 600.
Each sequence describes a process that afterwards is assessed using 5 features so for each sequence I also have a vector of 5 Pass or Fail evaluation.

I'm trying to predict the Pass or Fail vector from the sequences.. I thought that mining frequent subsequences (with BIDE+ for example) may be a starting point and then trying to classify them using ID3 for each of the 5 features may work..but I'm just starting and not sure if this is a good approach.

Alessandra

Options: Reply•Quote

Re: Frequent subsequences for prediction

Posted by: webmasterphilfv

Date: June 24, 2012 02:02AM

Hello Alessandra,

It seems like a good approach.

Here is another approach that could work. To predict the vector of a sequence, you could search for the most similar sequence(s) (like "k-nearest neighboor" winking smiley

. But to do that, you would need to define a measure of similarity to compare how similar two sequence are. A measure of similarity that you could use is the longest common subsequence shared by the two sequences, for example.

To evaluate the accuracy of prediction, you could use cross-validation.

This is my idea about your problem.

Best,

Philippe

Edited 2 time(s). Last edit at 06/24/2012 02:06AM by webmasterphilfv.

Options: Reply•Quote