Re: Interestingness of patterns.
Date: June 11, 2021 07:44AM
Good evening,
Yes, there are indeed many pattern types (itemset, sequential patterns, episodes, etc.), and also many different measures (support, utility etc.) that can be used to select patterns.
How to know if a pattern is interesting?
Some criterias can be subjective (more like an opinion). For example, I discover that {bread,milk} is purchased by many customers but I know this already so for me it is not interesting because it is not novel.
Some other criterias are more objective. For example, in top-k high utility itemset mining, we want to find the k patterns that make the most profit. The money is something that can directly make sense for a company.
But, for several pattern mining task or measures, whether a pattern is interesting or not is less clear. For example, in frequent itemset mining, we aim to find patterns that appear frequently in the data... But such patterns even if they are frequent they may still be just the result of chance. For example, if everyone buys bread in a store, I could find a pattern {bread,computer} but that would be a spurious patterns because it just appear together many times by chance because everyone buys bread... not because of a special relationships between these values.
So to address this kind of problem, an interesting research direction is to find "correlated patterns" or "statistically significant patterns". For example, in correlated itemset mining, we look for patterns where the items have a strong correlation together using some measures such as the bond.
And to find statistically significant patterns, we use statistical test in pattern mining. So the goal is to find patterns that represent something statistically significant... This is interesting for domains such as medical data... You could find a pattern like {drink_water, cancer} which is highly frequent but the statistical significance that it appears more than just by chance would be likely very low... so it would be discarded.
In my opinion, using statistical tests is a very promising approach.. but the problem is that it makes the algorithms much more complicated.
Besides all of that, if we want to check if a pattern is interesting, we can ask a domain expert. For example, in a recent paper that I published albout alarm correlation rules, we asked some telecommunication expert to validate the rules that we found in a computer network. The expert could validate the rules as good or not and also indicate if they were unknown or already known. This is an interesting approach to evaluate a new pattern type.
Besides that, another way to evaluate patterns is to try to use them for some tasks. For example some papers will present a new pattern type and show that they are useful for tasks such as sequence prediction, clustering etc.
Hope that this helps
Best regards