Using CM-Spam for very long sequences

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Using CM-Spam for very long sequences

Posted by: Madiha Khan

Date: September 10, 2020 02:58AM

Hi,

I am new to SPMF and am exploring the use of sequential pattern mining for my dataset. I'm finding the tool very useful, but have run into some difficulties with using it for longer sequences

My research uses unstructured data relating to tutor-student interactions. I have labelled the data using a framework, and now have a very long set of sequences, with each sequence relating to one tutoring session. The labels for the data are numerical positive integers, and I have prepared the input file using the instructions on the SPMF website (i.e. every event is separated by a -1, and there is a -2 at the end of every sequence). The sequences consist of a large number of events, rather than item-sets, and I have thus treated each event as its own item-set, and separated each event with a -1.

When I run the CM-SPAM algorthim on a set of 8 sequences with a minsup of 50%, the output file shows a number of patterns with a support of either 3, or 2. I can tell manually that the support should be higher for many of these patterns, and this leads me to think that CM-Spam is not recognizing the bottom 5 sequences for some reason. I'm not sure why this is. The sequences are very long and tend to spill over into multiple lines, so this may be impacting how the algorithm works - could this be a possibility? How could I adjust the files, if so?

Thanks in advance for the help

Madiha

Options: Reply•Quote

Re: Using CM-Spam for very long sequences

Posted by: webmasterphilfv

Date: October 04, 2020 10:32PM

Dear Madiha,

Thanks for using SPMF, and I am sorry for not answering earlier. Usually, I receive an e-mail for each message poster on the forum and I try to answer quickly to each message. But somehow, I did not see the notification.

I think the problem is likely due to some issue in the input file. It could be a bug... but since this algorithm has been used by many people, I think it is more likely to be a problem with the input file such as a -2 that is missing or something else like that.

If you have not found a solution yet, you can send me a direct email at philfv8 AT yahoo DOT COM and I will try it on my computer to see what is happening. But if you do so, please also tell me the parameters that you are using and also which patterns you think is not correct.

Best regards,

Philippe

Options: Reply•Quote