Yes, some algorithms of SPMF can output the sequences where a pattern appears. It would be simple to do but it would require a little bit programming
If you use the source code of SPMF. The version of PrefixSpan that save to file can be modified as follow to show the sequences where a pattern appears:
In the file AlgoPrefixSpan in the folder ca.pfv.spmf.algorithms.sequentialpatterns.BIDE_and_prefixspan,
change this:
private void savePattern(SequentialPattern prefix) throws IOException {
// increase the number of pattern found for statistics purposes
patternCount++;
// if the result should be saved to a file
if(writer != null){
// create a stringbuffer
StringBuffer r = new StringBuffer(""
// for each itemset in this sequential pattern
for(Itemset itemset : prefix.getItemsets()){
// for each item
for(Integer item : itemset.getItems()){
r.append(item.toString()); // add the item
r.append(' ');
}
r.append("-1 " // add the itemset separator
}
// add the support
r.append("#SUP: "
r.append(prefix.getAbsoluteSupport());
// write the string to the file
writer.write(r.toString());
// start a new line
writer.newLine();
}
// otherwise the result is kept into memory
else{
patterns.addSequence(prefix, prefix.size());
}
}
by that:
private void savePattern(SequentialPattern prefix) throws IOException {
// increase the number of pattern found for statistics purposes
patternCount++;
// if the result should be saved to a file
if(writer != null){
// create a stringbuffer
StringBuffer r = new StringBuffer(""
// for each itemset in this sequential pattern
for(Itemset itemset : prefix.getItemsets()){
// for each item
for(Integer item : itemset.getItems()){
r.append(item.toString()); // add the item
r.append(' ');
}
r.append("-1 " // add the itemset separator
}
// add the support
r.append("#SUP: "
r.append(prefix.getAbsoluteSupport());
r.append(" SEQUENCE IDS : "
// for each itemset in this sequential pattern
for(Integer sequenceID : prefix.getSequenceIDs()){
// for each item
r.append(sequenceID); // add the item
r.append(' ');
}
// write the string to the file
writer.write(r.toString());
// start a new line
writer.newLine();
}
// otherwise the result is kept into memory
else{
patterns.addSequence(prefix, prefix.size());
}
}
After that if you run the algorithm by using the test file MainTestPrefixSpan_saveToFile, it will give you a result where each pattern will be followed by the ids of sequences containing the pattern:
1 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 3 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 3 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 3
1 -1 3 -1 2 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
1 -1 2 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 2 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 3
1 -1 2 3 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 2 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 2 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
2 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 2 3
2 3 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
3 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
3 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 3
3 -1 2 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
4 -1 #SUP: 3 SEQUENCE IDS : 0 1 2
4 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 2
4 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 1 2
4 -1 2 -1 #SUP: 2 SEQUENCE IDS : 1 2
5 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
5 -1 6 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 #SUP: 3 SEQUENCE IDS : 0 2 3
6 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
For the other question, I don't know much about sequence clustering. But I think that some work have perhaps been done related to this in bioinformatics.
Best,
Philippe
Edited 2 time(s). Last edit at 10/08/2013 05:06PM by webmasterphilfv.