The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
long items problem
Posted by: zlk
Date: January 29, 2019 12:35AM

Hi,all
May I ask a question about Sequential Patterns mining.
when i uesd Sequential Pattern Mining algorithms to mine sequences that an itemset included 300 items.
it would be error as follow:
java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.lang.Integer.valueOf(Unknown Source)
How can solve this problem?
thanks in advance.

ps: Xmx is increased to 10g

Options: ReplyQuote
Re: long items problem
Date: January 29, 2019 06:24AM

Hi,

That means that the algorithm is running out of memory. The reason is that the search space is too large (there are too many possibilities too consider, and the algorithm must keep too much information in memory when searching).

The solution is to:
- change the values for the parameters. For example, if you do sequential pattern mining with minsup = 0.4, you can increase to higher values such as 0.9 and then if it works, you can decrease the parameter. Generally, the number of patterns and the search space can increase EXPONENTIALLY when you decrease a parameter such as the minsup threshold.... So it is better to start with high values and decrease when you have enough patterns.
- preprocess your data or use a subset of your data to make it smaller (remove items, remove transactions etc.).
- add some constraints. Several algorithms will offer to set some additional constraints. For example, for sequential pattern mining, if you use an algorithm such as CM-SPAM you can specify that you don't want gap and that a pattern should not contain more than 3 items. This will GREATLY redue the search space, make the algorithm faster and use less memory. Using constraint is a very good idea to make the algorithms more efficients.
- use another algorithm. If you use some algorithm like GSP, then it is not the fastest one. If there are multiple algorithms for the same problem, you may consider trying other algorithms that are faster or more efficient.
...

This should give you some ideas about what to try.

Best regards,

Philippe

Options: ReplyQuote
Re: long items problem
Posted by: zlk
Date: January 29, 2019 06:47PM

Hi,professor
Thanks for your detailed answer,i will try more algorithms.I just need to get sequential patterns,efficiency is not very import to me.

Best wishes

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.