The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Issue in My Dataset for HUSPM
Posted by: P N RAMESH
Date: March 29, 2020 06:12AM

I am working with HU Sequential Pattern mining algorithm. I got following error for my dataset.

Index 5 out of bounds for length 5
at SPFM/ca.pfv.spmf.algorithms.sequentialpatterns.uspan.AlgoUSpan.runAlgorithm(AlgoUSpan.java:395)


my data set is

1[9] 2[7] -1 -2 SUtility:16
1[8] 3[8] 4[8] 5[8] -1 -2 SUtility:32
6[9] 7[8] 8[9] -1 -2 SUtility:26
9[8] -1 -2 SUtility:8
10[8] -1 -2 SUtility:8
11[8] 3[8] -1 12[6] 4[8] 5[8] -1 -2 SUtility:38
10[8] -1 -2 SUtility:8
6[6] 8[6] 13[8] -1 -2 SUtility:20
14[8] 15[8] 16[8] -1 -2 SUtility:24


if i remove 6th sequence, it is working.

is it anything wrong in 6th sequence?

Thanks in advance.

Options: ReplyQuote
Re: Issue in My Dataset for HUSPM
Date: March 29, 2020 09:39AM

Good evening!

Thanks for using SPMF! The problem is the following:

It may not be explained clearly in the documentation, but there is an assumption that the items whithin an itemset are ordered by ascending order (e.g. 1, 2, 3 4...). If that order is not respected then, the algorithm may produce some incorrect results.

So this sequence:
11[8] 3[8] -1 12[6] 4[8] 5[8] -1 -2 SUtility:38

should be replaced by:

3[8] 11[8] -1 4[8] 5[8] 12[6] -1 -2 SUtility:38

so that items are in ascending order.

I will explain this more clearly in the documentation. Why this order? Because it allows to do some optimization.

Then it works.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.