The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
VMSP minsup and show sequence id -- sequential pattern mining
Posted by: Leo Liu
Date: August 24, 2015 09:34AM

Hi,
I have two questions about running the VMSP algorithm using the SPMF package.

1) when I set the minsup parameter, the output does not seem to include all patterns with support greater or equal than the minsup. For example, if I do

java -jar spmf.jar run VMSP spmf_test_files/contextPrefixSpan.txt output.txt 50%

The output contains only patterns with sup = 2

When the parameter is changed to 25%, the output contains only patterns where sup = 1, but I'd like to include all patterns with sup >=1. How may I do that?

2) How do I get all the sequence ids for each pattern? The documentation says setting the "show sequence id" parameter to true, but it doesn't seem to work for me.

Thank you.



Edited 1 time(s). Last edit at 08/24/2015 08:41PM by webmasterphilfv.

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Date: August 24, 2015 08:38PM

Hi,

Thanks for using SPMF and posting on the forum.

Here are the answers to your question.

ANSWER TO QUESTION 1
Yes, and it is normal because VMSP is an algorithm for discovering maximal sequential patterns. A maximal sequential pattern is a sequential pattern that is frequent (that has a support no less than "minsup"winking smiley and that is not included in another larger sequential pattern. Thus, as you notice some patterns are missing. But it is because they are not maximal.

Let me explain that in more details by using the example from the documentation.

If we set minsup = 75 %, we obtain the following patterns with VMSP:

6 -1 SUP: 3
5 -1 SUP: 3
4 -1 3 -1 SUP: 3
2 -1 3 -1 SUP: 3
1 -1 3 -1 3 -1 SUP: 3
1 -1 3 -1 2 -1 SUP: 3

Now, if we set minsup = 50%, we obtain the following patterns with VMSP:

6 -1 2 -1 3 -1 SUP: 2
5 -1 2 -1 3 -1 SUP: 2
4 -1 3 -1 2 -1 SUP: 2
1 2 -1 6 -1 SUP: 2
1 -1 3 -1 3 -1 SUP: 3
1 -1 2 -1 3 -1 SUP: 2
5 -1 6 -1 3 -1 2 -1 SUP: 2
5 -1 1 -1 3 -1 2 -1 SUP: 2
1 2 -1 4 -1 3 -1 SUP: 2
1 -1 2 3 -1 1 -1 SUP: 2

Note that most of the patterns found for minsup = 75 % are not found for minsup = 50 % because they are included in the patterns having a support of 50 % and thus are not maximal anymore when we set minsup = 50%.

But you can still notice that the pattern 1 -1 3 -1 3 -1 is there with a support of 3.

Now if you set minsup = 25%, we will find only four patterns with a support of 1 because the are the maximal patterns. Other patterns are not found because they are all included in these four patterns.
5 -1 7 -1 1 6 -1 3 -1 2 -1 3 -1 SUP: 1
1 4 -1 3 -1 2 3 -1 1 5 -1 SUP: 1
5 6 -1 1 2 -1 4 6 -1 3 -1 2 -1 SUP: 1
1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 SUP: 1

Thus, to answer your question, this behavior is normal by the definition of a maximal sequential pattern.

If you want "all patterns", you may consider using an algorithm such as CM-SPAM that will find all patterns. You may also consider using ClaSP for "closed sequential patterns". Actually, maximal sequential patterns are a subset of closed sequential patterns which are a subset of all sequential patterns.

ANSWER TO QUESTION 2

You are right, this was an error in the documentation because that feature was not implemented for VMSP. I have just updated the code for you on the website so that now it can show sequence identifiers. I have added the feature to VMSP, TKS, VGEN and SPAM. Thanks for reporting this issue. I will update the documentation also later.

Best regards,

Philippe



Edited 3 time(s). Last edit at 08/24/2015 08:40PM by webmasterphilfv.

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Posted by: Leo Liu
Date: August 24, 2015 09:49PM

Thank you for the detailed explanation and the code update! This is awesome.

I tried to print the sequence identifier for my dataset. Some of the sequence ids seem to be repeated in the output file. Other than this, it works perfectly.

Thanks again!

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Posted by: Leo Liu
Date: August 24, 2015 09:58PM

Actually I was doing something wrong. The repeated sequence ids are due to some bugs on my end. My apologies.

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Date: August 25, 2015 03:15AM

Ok. Glad that it works!

Philippe

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Posted by: DataMinded
Date: March 09, 2018 05:48AM

I have implemented the VMSP algorithm. When I put showSequenceIdentifiersInOutput to true however, I get an java heap space out of memory error. This doesn't occur when I have it to false. Why is this and is this fixable?

Options: ReplyQuote
Re: VMSP minsup and show sequence id
Date: March 09, 2018 06:50AM

This is not a bug. It is because the algorithm needs to use too much memory and run out of free memory on your computer. Actually, the problem of sequential pattern mining is a hard problem. If you want to make the algorithm faster and decrease memory:
1) You can increase the minsup threshold
2) You can use some additional constraints. For example, if you use algorithms such as CM-SPAM, you can apply a lot of constraints such as the maximum pattern length. Using this constraint can greatly reduce the search space of patterns and make the algorithm much faster and reduce its memory usage.

The problem with the parameter Show identifiers? is that if you set it to true, the algorithm needs to calculate the sequence identifiers for each pattern found and this increase the memory usage. But if you add some constraints as suggested above, it may be able to solve the problem.

Best,

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.