The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
Frequent subsequences for prediction
Posted by: Anonymous User
Date: June 19, 2012 12:52AM

Hi I'm confused about how to approach a problem regarding using sequences for prediction.
I have 200 different sequences of 14 letters, all sequences lenghts' are different ranging from 20 to 600.
Each sequence describes a process that afterwards is assessed using 5 features so for each sequence I also have a vector of 5 Pass or Fail evaluation.

I'm trying to predict the Pass or Fail vector from the sequences.. I thought that mining frequent subsequences (with BIDE+ for example) may be a starting point and then trying to classify them using ID3 for each of the 5 features may work..but I'm just starting and not sure if this is a good approach.


Options: ReplyQuote
Re: Frequent subsequences for prediction
Date: June 24, 2012 02:02AM

Hello Alessandra,

It seems like a good approach.

Here is another approach that could work. To predict the vector of a sequence, you could search for the most similar sequence(s) (like "k-nearest neighboor"winking smiley. But to do that, you would need to define a measure of similarity to compare how similar two sequence are. A measure of similarity that you could use is the longest common subsequence shared by the two sequences, for example.

To evaluate the accuracy of prediction, you could use cross-validation.

This is my idea about your problem.



Edited 2 time(s). Last edit at 06/24/2012 02:06AM by webmasterphilfv.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.