Hi Xiaowei,
Yes, I will explain the main idea.
CMDeo, CMRules, RuleGrowth, ERMiner: These algorithms have exactly the same input and exactly the same output. The only difference between these algorithm is how they work internally to find the output. They use different strategies to find the result and because of this some of them are faster or slower or use more or less memory. So the difference is about the performance. Generally, RuleGrowth is faster than CMRules, and CMRules is usually faster than CMDeo. ERMiner is probably faster than RuleGrowth but I think that it may use more memory. Of course, it depends on the data. It may not always be the case. So usually i recommend to use RuleGrowth or ERMiner.
These three algorithms have two parameters: minimum support and minimum confidence. The goal is to find some rules that have a high support and a high confidence (a support no less than the minimum support and a confidence no less than the minimum confidence).
TopSeqRules: This algorithm is based on RuleGrowth. It works in the same way but the parameters are different. The user need to set a number k and the minimum confidence. Then, the algorithm returns the top-k most frequent rules that have a confidence no less than the minimum confidence.
TNS: This is similar to TopSeqRules. The difference is that we also remove some rules that are said to be redundant.
TRuleGrowth: This is a modification of RuleGrowth where we let the user specify a new constraint that is the maximum window length. The parameters are the same as RuleGrowth except that there is a new parameter the maximum window length. The idea is the following.
If you look at a sequence like this:
(A) (
(C) (D) (E) (F) (G) (H)
maybe you could find a rule like this:
A --> H
But as you can see above, A and H are far appart from each other.
So using the new parameter (the maximum window length), you can set a constraint on how far appart the antecedent and consequent of a rule can be. For example, if you set the maximum window to 2, then the rule A --> H will not be considered as appearing in the sequence:
(A) (
(C) (D) (E) (F) (G) (H)
because A and H will be too far from each other.
That is the main idea.
Best regards,
Philippe