Sequential Rule Mining - where order of antecedent is important

Posted by:
**
Ben
**

Date: September 01, 2020 06:50AM

Firstly, I really like the SPMF package - very easy to use, and oh so many algorithms to choose from! I can't begin to fathom the years of work that has gone into this

On to my question:**is there an option to make the order of antecedent events important when mining sequential rules?** Or if not an option, a hacky alternative?

i.e., instead of:

{A, B, C} ==> Y

I would be looking for separate rules:

A B C ==> Y

A C B ==> Y

B A C ==> Y

B C A ==> Y

C A B ==> Y

C B A ==> Y

From reading your online material, I know you favour - in terms of performance - unordered sequences. But in any case, I would still like to investigate my sequential rules with order taken into account.

I do realise that the number of rules will explode, however if I limit myself to just 2 or 3 'events' within the antecedent sequence, I don't think it should explode by more than a factor of 6 (if I remember my combinatorics equation correctly)

Many thanks again for your sterling work!

On to my question:

i.e., instead of:

I would be looking for separate rules:

From reading your online material, I know you favour - in terms of performance - unordered sequences. But in any case, I would still like to investigate my sequential rules with order taken into account.

I do realise that the number of rules will explode, however if I limit myself to just 2 or 3 'events' within the antecedent sequence, I don't think it should explode by more than a factor of 6 (if I remember my combinatorics equation correctly)

Many thanks again for your sterling work!

Posted by:
**
Ben
**

Date: September 01, 2020 08:52AM

Thinking it over, I've come up with a hacky solution. Which goes a little like this:

If I know the support of the following rules:

A ==> B

{A, B} ==> C

{A, B, C} ==> Y

Then I can infer the support of the following**ordered** rule:

A B C ==> Y

i.e., where A, B, and C happen consecutively (not in any order).

It's hacky and slow, but it works. If there's a better solution than that (preferably built-in), I'm all ears!

If I know the support of the following rules:

{A, B} ==> C

{A, B, C} ==> Y

Then I can infer the support of the following

i.e., where A, B, and C happen consecutively (not in any order).

It's hacky and slow, but it works. If there's a better solution than that (preferably built-in), I'm all ears!

Posted by:
**
Ben
**

Date: September 01, 2020 09:04AM

Hmm, actually, that wouldn't work.

Back to the drawing board.

Back to the drawing board.

Posted by:
**
webmasterphilfv
**

Date: September 01, 2020 06:09PM

Hi!

I am happy that you like the software. Thanks for using it :-)

Yes, I indeed prefer the unordered rules, as you have noticed. But I think that for some applications, ordered can also be interesting.

If you want to try the ordered rules, you can use the RuleGen algorithm in SPMF.

The idea of that algorithm is very simple:

(1) it first finds the sequential patterns (sequences of symbols that appear many times in your data like <A,B> and <A,B,C,D>

(2) then RuleGen takes these sequential patterns and combine them together to make rules that are totally ordered. For example, if you combine the two sequential patterns <A,B,C,D> and <A,B> you can obtain the ordered rule <A,B> --> <C,D> and also you will be able to calculate the confidence and support.

I think that this is the easiest way of obtaining the totally ordered sequential rules. It is actually not very complicated because you can reuse the sequential pattern mining algorithms.

In SPMF, there is only the RuleGen algorithm that does that. And I think that I have implemented it using PrefixSpan for discovering sequential patterns. In theory, PrefixSpan could be replaced by other sequential pattern mining algorithms to find other types of totally ordered sequential rules but I have not done that.

Best regards,

Philippe

Edited 1 time(s). Last edit at 09/01/2020 06:11PM by webmasterphilfv.

I am happy that you like the software. Thanks for using it :-)

Yes, I indeed prefer the unordered rules, as you have noticed. But I think that for some applications, ordered can also be interesting.

If you want to try the ordered rules, you can use the RuleGen algorithm in SPMF.

The idea of that algorithm is very simple:

(1) it first finds the sequential patterns (sequences of symbols that appear many times in your data like <A,B> and <A,B,C,D>

(2) then RuleGen takes these sequential patterns and combine them together to make rules that are totally ordered. For example, if you combine the two sequential patterns <A,B,C,D> and <A,B> you can obtain the ordered rule <A,B> --> <C,D> and also you will be able to calculate the confidence and support.

I think that this is the easiest way of obtaining the totally ordered sequential rules. It is actually not very complicated because you can reuse the sequential pattern mining algorithms.

In SPMF, there is only the RuleGen algorithm that does that. And I think that I have implemented it using PrefixSpan for discovering sequential patterns. In theory, PrefixSpan could be replaced by other sequential pattern mining algorithms to find other types of totally ordered sequential rules but I have not done that.

Best regards,

Philippe

Edited 1 time(s). Last edit at 09/01/2020 06:11PM by webmasterphilfv.

Posted by:
**
Ben
**

Date: September 01, 2020 11:41PM

Many thanks for the prompt reply.

Ok - I'll take some time to explore RuleGen and digest what you have said. If I need some clarification, I may come back.

Ok - I'll take some time to explore RuleGen and digest what you have said. If I need some clarification, I may come back.