How to set the minutil threshold?
Date: March 10, 2022 10:40PM
I received this question today:
have a doubt about high utility itemset mining. How we can define the minimum utility? I have read many papers and they said that the minimum utility value is based on the users' preferences. Actually, I do trial and error to choose the minimum utility. But, is there any logic that can be used to define the good/optimal/correct minimum utility?
It really depends on the data and what you want to do.
High utility itemset mining can have many applications. One of them is to analyze customer purchase behavior. Then let's say that you have a store where all products are very cheap, then you may want to use a lower minimum utility otherwise you will find nothing. But if you have a store where all products are expensive and yield a high profit, you may want to use a higher minimum utility threshold because otherwise everything will be high utility.
So what I mean by this example is that for different contexts, the data is different and the threshold must be set differently
if the threshold is set too low there are more patterns, and too many patterns is not desirable because it takes space and if you want to look at them, then there will be too many patterns... Besides the runtime can increase a lot.
And if the threshold is set too high, then you will not find enough patterns, or you may even find nothing.
So generally, people will just start from a high minutil value and decrease until they find enough patterns and not too much.
To avoid the problem of how to set minutil, some papers have redefined the problem as top-k highutility itemset mining. In that problem, instead of setting minutil, you can set directly k the number of patterns that you want to find. For example, you can say i want to find the top-500 patterns that make the most money. This is much more intuitive in my opinion than setting the minutil threshold. But it is just another way to look at the problem.
So in conclusion, there is not really one way to decide what is the best minutil value... The best depends on your applications and how many patterns you want to find.