Re: Evaluation of the generated patterns
Date: July 07, 2021 05:15PM
Hi,
I see.
There are many measures for association rules. I have ever seen some kind of survey papers listing over 15 different measures. Which one is the best? I think it depends on your application and data.
In some books like "Introduction to data mining" by Tan & Kumar, there is a chapter about association rule mining that talk about the evaluation of patterns, and they show with examples that in some scenario the confidence is a good measure, but in other scenarios, the lift is better, etc. From this, we can see that there is not a single measure that is better than the other measures all the time... Some measures have some advantages in some situations but it depends on the data and most importantly on how you want to use the rules.
For the basic measures, some interpretation can be like this:
support: higher can be viewed as better usually
confidence : This is like the conditional probability. Higher is better and between 0 to 100%
lift : For an association rule X ==> Y, if the lift is equal to 1, it means that X and Y are independent. If the lift is higher than 1, it means that X and Y are positively correlated. If the lift is lower than 1, it means that X and Y are negatively correlated.
etc.
But still, how good a rule is depends on the application.
For example, in some application like information retrieval, you may be interested in finding the most frequent patterns so the support may be quite important. But in some other applications, you may want to focus on patterns that are not necessarily frequent but have a strong correlation. So in that case, the support is not very important and you may want to focus on a high confidence or lift.
Another example is for using the rules for prediction. Some people will build some classifiers using association rules and use various measures to select the best rules for prediction. Some for exmaple will multiply the support by confidence to select the best rule or use the lift or other measures (i do not remember the details)
Personally, from the dozen of measures that exist, I notice that most people just use the most simple measures like support, confidence and lift.
If you want to interpret many measures, I think first you need to real carefully the definitions of these measures to make sure you understand well what they mean and then try to interpret them in your application.
Best regards,