The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Represent results of Sequential Patterns algorithms
Posted by: Veronica
Date: October 02, 2014 10:04PM

Hello, I'm finding sequential patterns using PrefixSpan, GSP, SPADE, LAPIN,SPAM etc and they work really fine.
My dataset is about 125 sequences of 100 items each, so when I set the SUPPORT parameter to 0.9 I get as result more than 30000 lines like these for example using SPAM algorithm:

1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 SUP: 114
1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 SUP: 113
1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 SUP: 111
1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 1 -1 SUP: 111

How could I represent this results in a document? because the results are too many sequences, do I just put in the document the 10 sequences showing the highest SUP?

My idea was to create a table where for each algorithm show the obtained sequences, the SUPPORT and the maximun pattern length parameter but I'm getting quite a lot of sequences as results.

Options: ReplyQuote
Re: Represent results of Sequential Patterns algorithms
Date: October 03, 2014 04:39AM

Hi,

I'm glad that you got some interesting results.

To answer your question, it depends what is your goal.

For example, let's say that you are applying sequential pattern mining on a dataset of student data to find some interesting patterns about the courses that the student takes.

In a document presenting the results, I would just show maybe five to ten patterns as example and explain their meaning. Those patterns are not necessarily the most frequent patterns but you could take the most frequent if you like. For example, for student data, it could be that you found some unexpected pattern such as

STUDENT_TAKE_Computer_COURSE -> STUDENT_TAKE_PSYCHOLOGY_COURSE --> STUDENT_TAKE_PHYSIC_COURSE

Then I would try to explains in the text why these patterns are interesting.

Ideally, a pattern should be unexpected, something that we did not know already, and something useful. So I could say the previous pattern is unexpected, is new and it could be useful for enhancing the studying programs for students.

You could also discuss in your text that you have found some very long patterns or some very frequent patterns and what it means for your data. Is it what you expected or not?

Besides that, here are a few ideas:

- If you want to find less patterns, you could also set a maximum size. If I remember well, think that my SPAM implementation allows to set a maximum size for patterns to be found.

- If you want to find only the most frequent patterns, you could also use TKS for mining the top-k sequential patterns. For example, if you give k = 100, TKS will find the 100 most frequent patterns.

- You may also consider using the HirateYamana algorithm in the GUI of SPMF. It allows to specify constraints on patterns to be found such as the maximum size, the maximum gap between items in a patterns etc. It is used for datasets with timestamps. But you could use a regular dataset and convert it to a dataset with timestamps (Example #86) on the latest version of SPMF)

- You could convert your dataset to a transaction database (Example 78) and then apply association rule mining algorithms

- You could try to mine sequential rules to see what kind of patterns you would get.

This is just a few ideas. Hope this helps.

Best,



Edited 1 time(s). Last edit at 02/08/2017 06:41PM by webmasterphilfv.

Options: ReplyQuote
Re: Represent results of Sequential Patterns algorithms
Posted by: Veronica
Date: October 05, 2014 08:28AM

Thanks so much Philippe, your ideas indeed help!!

All the best to you!
VerĂ³nica

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.