The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
TKS Heap Space Memory Error
Posted by: Selwyn Hector
Date: March 18, 2019 07:29AM

Hello, I am running TKS on a pretty large dataset of about 60,000 rows but even running with 2gb on my school's machine and i get an OutOfMemoryError. I currently am trying to get the top 50 sequences with two items being required and a minimum pattern length of 12. I know every line in my input contains the two elements somewhere. Could anyone more familiar with the algorithms tell me parameters i could use to cut down the memory or any way i could maybe get the patterns for each half of the dataset and merge them?

Options: ReplyQuote
Re: TKS Heap Space Memory Error
Date: March 19, 2019 06:26AM

Hi,

Using a minimum pattern length of 12 is probably one of the reasons why it takes so much resources.

Generally, if you use more strict constraints, the algorithm will be faster. I would suggest to:
- use the maximum length constraint instead... and to set it to a small value such as 3 or 4. Then if it runs, you can increase it to larger values. But if you set minlength = 12, the search space will be huge. The minimum length constraint does not help to reduce the search space, and makes it worse, while the maximum length constraint can greatly reduce the size of the search space.
- You may also consider adding other constraints such as using a maximum gap. This can also help to reduce the number of possibilities.

Actually, even if a dataset just contain 60k sequences, if the sequences are very long and similar, the search space can still be very big!

Besides, the above suggestions, another possibility is to do some preprocressing on the to remove some irrelevant items, or apply other transformations that can make the data more simple.

Best regards

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.