Hi Saiph,
If it runs out of memory it means that the search space is too large. One reason may be that there is too many patterns or that the patterns are too long.
Increase the heap size is a good idea. Besides, there are several ways to reduce the number of patterns and thus to improve the performance.
- increase the minsup parameter from 0.2 to something larger. You can try some high value like 0.9 and then decrease until you find a good value.
- lower the "maximum whole interval" parameter from 100 to something smaller. For example, you could put 5 or 10. Maybe it would be enough.
- lower the "maximum item interval" parameter from 100 to something smaller
- do some preprocessing on your data to eliminate some unecessary information or split your data in half for example.
Yes, the Hirate & Yamana do not accept strings yet. Now i'm a little bit busy. But this summer I will probably add a tool to convert text to sequence database. By the way, I have added a few new datasets in the
datasets section of the website (note that some of them are large and may not work well with some algorithms)!
Best,
Philippe
Edited 1 time(s). Last edit at 06/19/2013 10:07AM by webmasterphilfv.