Hello Alessandra,
It seems like a good approach.
Here is another approach that could work. To predict the vector of a sequence, you could search for the most similar sequence(s) (like "k-nearest neighboor"
. But to do that, you would need to define a measure of similarity to compare how similar two sequence are. A measure of similarity that you could use is the longest common subsequence shared by the two sequences, for example.
To evaluate the accuracy of prediction, you could use cross-validation.
This is my idea about your problem.
Best,
Philippe
Edited 2 time(s). Last edit at 06/24/2012 02:06AM by webmasterphilfv.