The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
About GoKrimp, its performance and the SignTest
Posted by: vrodriguezf
Date: January 10, 2019 01:50AM

Hi,

I want to use GoKrimp to compress my sequence database with a set of meaningful patterns. This is an extract of the input file I am using, in case anyone want to reproduce my study case
link to the input file

Currently I am running GoKrimp for the first time with this input file, and it is taking a while to execute (it did not finish yet). Reading the tips & performance information in the documentation of the algorithm, I have two questions:

1. The documentation says: "GoKrimp is very efficient. It output one pattern in each step so you can terminate the algorithm at anytime." Is there a way to monitor how many patterns have been output during the eectuion of the algorithm?

2. Regarding the SignTest used in GoKrimp, the documentation says "If you have a very long sequence instead of a database of many sequences, you should split the long sequences into a set of short sequences.". This is my case indeed. I have a sequence database composed of 1134 sequences, and the average length is 1466.60. I am wondering what is a good sequence length for this algorithm, and which are the negative consequences of spiting the database, in terms of pattern consistency.

Best!



Edited 2 time(s). Last edit at 01/10/2019 01:51AM by vrodriguezf.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Date: January 10, 2019 07:00AM

Hi again,

For 1), the feature is not implemented in SPMF. But, this should be easy to add. Basically, it would require to add a counter, a if statement and a System.out.println() statement, I think (I did not look at the code). If you need that feature, I think I could do it quite quickly because I think it would be simple. If you tell me that you want it, I will try to do it.

For 2), it would be best to ask the first author of the paper who provided the code to SPMF.

Best regards,

Philippe



Edited 1 time(s). Last edit at 01/10/2019 07:01AM by webmasterphilfv.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Posted by: vrodriguezf
Date: January 10, 2019 07:03AM

Hi Philippe,

Many thanks for your help. I am not in a rush with this analysis so take your time to add that small logging functionality. Regarding point 2), I will contact the author of the paper.

Best!



Edited 1 time(s). Last edit at 01/10/2019 07:04AM by vrodriguezf.

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Posted by: Rémi Adon
Date: March 28, 2021 07:46AM

Hi there,

any update concerning the sign test ?

I am implementing a version of GoKRIMP, and my v0 has no sign test (this test was not described in early version of the paper)

I am trying to get a grasp on how this test impacts robustness on different datasets. Also if user need to understand the inners of the algorithm to use it, that's definitely a painpoint

Cheers,
Rémi

Options: ReplyQuote
Re: About GoKrimp, its performance and the SignTest
Date: April 09, 2021 05:31PM

Hi, Just curious, have you succeeded to finish your implementation and get feedback from authors?

Best regards,

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.