The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: Francisco
Date: March 07, 2018 12:47AM

Hi all,

First of all, I'd like to thank professor Fournier-Viger and team for the amazing work in SPMF.

I've been playing with the software for my research, and one question arises: after obtaining frequent patterns for any sequential pattern mining algorithm for my sequence database, I can get both patterns, support and sequence IDs. Is it, by any way, possible to obtain, for each SID, which exact items in the sequence are those that made up the pattern match, so I can get more information about them?

I guess it is not possible directly, but maybe somewhere in the source code there are points where I can store that item information to retrieve it afterwards.

Thank you very much in advance. Any clue on this is welcome.

Fran

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: March 09, 2018 07:32AM

Hi,

Thanks for message. I have taken a little bit time to answer because I have been quite busy this week. Also, I have modified the code a little bit to try to provide some feature that is close to what you asked but is maybe not exactly what you want.

Let me explain. There is a hidden feature in SPMF where you can specify the names of the items. To do that, if you download the new version of SPMF, you can encode your file as follows:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
@ITEM=6=noodle
@ITEM=7=rice
@ITEM=-1=|
1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2
1 4 -1 3 -1 2 3 -1 1 5 -1 -2
5 6 -1 1 2 -1 4 6 -1 3 -1 2 -1 -2
5 -1 7 -1 1 6 -1 3 -1 2 -1 3 -1 -2

Then, you must use the user-interface or command line of SPMF to run an algorithm to find the sequential patterns like CM-SPAM. By doing this, the items in the result file will be automatically replaced by their names:

apple | #SUP: 4
orange | #SUP: 4
tomato | #SUP: 4
milk | #SUP: 3
bread | #SUP: 3
noodle | #SUP: 3
apple | apple | #SUP: 2
apple | orange | #SUP: 4
apple | orange | apple | #SUP: 2
apple | orange | tomato | #SUP: 2

In the above example, in addition to replacing the item names, I have replaced the -1 by |, to make the results more readable.

Also you could have used the "show sequence identifiers option and this would also work!

Hope that this can be helpful!

If you want to do the same thing from the source code, it is possible obviously but it is more complicated.

I have not described this feature in the documentation yet. It is a kind of hidden feature ;-)

Best regards,

Philippe

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: Francisco
Date: March 12, 2018 07:15AM

Dear Professor,

Thank you very much for your detailed response and for showing us this hidden feature.

My original question was, however, a bit different. What I would need to determine is, given a specific sequence that has been marked as "containing" a given pattern, where (an identifier or position, for example) in the sequence was the pattern (or patterns) found.

The reason is that, in my application, I have much more information associated with each given item in separate files, but in the same positions for each sequence, so I really need to identify each item in the extracted pattern within a sequence in order to also extract the rest of the information from my database.

Fran

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: March 12, 2018 07:59AM

Dear Fran,

I see. This feature would be possible but it is maybe not so easy to add. Something tricky about this is that a pattern may appear multiple times in a same sequence. For example, the pattern (A,cool smiley(A) appears at least 5 times in the sequence (A,B,C)(A,cool smiley,(A,B,D)(A,C). Thus, if I understand well, you would like that the program indicate each occurrence of the pattern in that sequence?

For example, it could be said that
the first occurrence of (A,cool smiley(A) is 1st itemset, 2nd itemset
the second occurrence of (A,cool smiley(A) is in 1st itemset, 3rd itemset
....

Because each algorithm is different, I think that to add this feature, it would require to choose one algorithm to modify it for that. Or another way that would be less efficient, would be to do a post-processing step to highlights the occurrences of each patterns in each sequence where it appears. This would maybe be easier to do. Which algorithm are you using?

Best,

PHilippe

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: Francisco
Date: March 14, 2018 04:18AM

Dear Professor,

Once again, thanks for your prompt response.

I am actually working on sequential pattern mining, especially with VGEN, VMSP or HirateYamana. In general, it would be nice to have a "generic" mechanism to obtain the specific IDs so it could be used in all algorithms.

As you say, there are a couple of tricks:

- Patterns can appear multiple times in a sequence.
- Gaps between pattern elements can appear in a sequence, which makes it really difficult to perform a post-procesing approach.

Fran

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: Francisco
Date: March 22, 2018 08:59AM

Dear Professor,

After a long study, VMSP is our algorithm of interest. I have implemented the post-processing search that works correctly, but obviously only when gap is fixed to 1. Otherwise, the post-processing search process is too tricky.

So, I have decided to give it a try and start thinking about modifying the VMSP code to support storing the exact itemsets in which each occurrence of the pattern is given.

Just a preliminar question: is there any document that sketches the data structures and main ideas that underly the SPMF implementation? (e.g. the concept of bitmap, structures in which candidate items are stored before considering them a pattern, etc.)

Thanks again,

Francisco

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: March 24, 2018 07:16AM

Dear Francisco,

Sorry for the delay to answer. I have been quite busy and did not check the forum for a few days.

The main data structure used in VMSP is the bitmap structure to store lists of sequence identifiers. This bitmap representation was proposed in the SPAM algorithm. So if you want to understand all the details of the bitmap structure, I recommend to read the paper about SPAM. It explains the bitmap structure with some example.

That bitmap structure is used in VMSP. Actually VMSP can be viewed as an extension of SPAM where we only want to keep the maximal patterns. VMSP use some additional structures described in the VMSP paper to keep the maximal patterns.

By the way, about the bitmap structure of SPAM, it is the same in my implementation as in the SPAM paper, except perhaps that I have added some optimizations for example to record the first and last 1 in the bitmap to reduce it size.

Best

Philippe

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: jbd
Date: December 17, 2018 08:48AM

Dear Prof. Philippe, Dear Francicso,

This topic is interesting as I am working on something very similar and need to get extra information about a frequent sequence as well.

I would like to ask Francicso if he had found any relevant solution to this problem by now. And I would also like to ask Prof. Philippe whether SPADE algorithm that stores the id-lists of sequences already has a solution. As far as I understood, SPADE specifies the id of the sequence as well as in which itemset it is (or the event id). Can't we get as an output: each frequent sequence along with its list of sequence id's and event id's?

Thank you,
jbd



Edited 1 time(s). Last edit at 12/17/2018 12:09PM by jbd.

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: December 19, 2018 06:04AM

Hi JDB,

Yes, it could be done. The information is already stored in the data structure, so it would only necessary to add a little bit code to save it to the file and add a new parameter to activate that. Then, I would have to udpate the documentation.

If you need it, I can do it for the next version of SPMF next week.

Best regards,

Philippe

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: jbd
Date: December 19, 2018 07:25AM

Hi Prof. Philippe,

Thank you for your response.

Yes, in fact it would be perfect to have it because this is exactly what I need for my work! I thank you a lot for that.

Best regards,
jbd



Edited 1 time(s). Last edit at 01/14/2019 04:09AM by jbd.

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: jbd
Date: January 23, 2019 11:36AM

Dear Prof. Philippe,

I have a question please. Is there an algorithm among the traditional sequential pattern mining algorithms that stores all the occurrences of a pattern in each sequence, or they all consider one occurrence of it? (and more precisely a vertical one)

Thank you,
jbd

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: January 24, 2019 05:04AM

For the traditional algorithm for sequential pattern mining, I think none of them will calculate all occurrences and store them explicitly.

The closest algorithm that do something like that is maybe Hirate & Yamana. It is an algorithm that extends PrefixSpan for timestamped data. It will consider that each timestamp is different and thus will in some way consider each occurrence as different because the timestamp will not be the same. But it will not really store each occurrence explicitly.

As we discussed previously, maybe one of the most simple solution is to add an algorithm that calculates the occurrences by post-processing. Maybe it would requires a few hours of programming That should not be too hard to do. Basically, it would be an algorithm that takes a pattern file and a database as input and output all the occurrences by comparing each pattern with each sequence.

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: jbd
Date: January 24, 2019 05:11AM

Thank you for the detailed response.

Best regards,
jbd

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Date: January 26, 2019 08:19AM

Hi,

Yesterday, I worked on the problem of finding all occurrences of a sequential patterns. I have put a new version of SPMF on the website. It includes a new post-processing algorithm called OCCUR. This algorithm will output all occurrences of a sequential patterns.

To use it, you need to have a sequence database and a set of patterns found by a sequential pattern mining algorithm such as CM-SPAM using the parameter "show sequence ids" set to true. Then, by applying OCCUR, you will get all the occurrences.

the documentation is here:

http://www.philippe-fournier-viger.com/spmf/OCCUR.php

Hope it helps!

Best,

Philippe

Options: ReplyQuote
Re: Obtain specific item IDs after obtaining frequent patterns in SPMF
Posted by: jbd
Date: January 27, 2019 11:38PM

Hi,

Great news. Thank you very much for this useful algorithm and for your help. It is appreciated.

Best regards,

Jbd

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.