Hi Ron,
Thanks for the suggestion.
I wrote some code for you. If you want to use a parenthesis format, you can replace the method loadFile() in SequenceDatabase.Java for the algorithm that you are using, by this:
public void loadFileParenthesisFormat(String path) throws IOException {
String thisLine;
BufferedReader myInput = null;
try {
FileInputStream fin = new FileInputStream(new File(path));
myInput = new BufferedReader(new InputStreamReader(fin));
int seqID = 0;
while ((thisLine = myInput.readLine()) != null) {
// si la ligne n'est pas un commentaire
if(thisLine.charAt(0) != '#'){
Sequence sequence = new Sequence(seqID++);
Itemset itemset = null;
String split[] = thisLine.split(" " ) ;
for(String itemString : split ) {
int start =0;
int end = 0;
if(itemString.charAt(0) == '('){
itemset = new Itemset();
start = 1;
}
if(itemString.charAt(itemString.length()-1) == ')'){
sequence.addItemset(itemset) ;
end = itemString.length()-1;
}else{
end = itemString.length();
}
Integer item = Integer.parseInt(itemString.substring(start, end) ) ;
itemset.addItem(item ) ;
}
sequences.add(sequence);
}
}
} catch (Exception e) {
e.printStackTrace();
}finally {
if(myInput != null){
myInput.close();
}
}
}
This method will allow the algorithm to read files according to this format:
(1) (1 2 3) (1 3) (4) (3 6)
(1 4) (3) (2 3) (1 5)
(5 6) (1 2) (4 6) (3) (2)
(5) (7) (1 6) (3) (2) (3)
Besides, for now , I will keep the -1 -2 format as the default SPMF format for now, to assure compatibility with previous versions and also because some other algorithms also use this format. But I will consider changing it in future versions. Thanks for the feedback!
Best,
Philippe
Edited 4 time(s). Last edit at 10/22/2012 05:17AM by webmasterphilfv.