The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Issues in using SPMF
Posted by: Shivani
Date: April 20, 2021 02:44AM

Sir,

I am Shivani, currently a sophomore at CSE (BTech + MTech) at IIT (BHU) Varanasi, India. I found your spmf library quite useful but I am stuck in the sense that I could not run HUIM-ABC, HUIM-BPSO and HUIM-GA algorithms using the dataset http://www.philippe-fournier-viger.com/spmf/datasets/foodmart.txt (from your own website) which contains 4141 lines, and other datasets too. I was wondering about the size being large but I guess it's not that large to be an issue. And if it is so please let me know upto what maximum lines in dataset those algorithms can support. Please let me know about that as soon as possible as I urgently require that in my data mining projects.

Kindly enlighten me in this regard. Thank you.

Options: ReplyQuote
Re: Issues in using SPMF
Date: April 20, 2021 07:34AM

Hi,

How have you set the parameter?
What is the error that you had?

There is no such thing as a maximum number of lines for these algorithms. The search space depends on the file, yes, but also on how you set the parameter(s). If you set the minutil threshold to a very low value, maybe the algorithm will be slower.

If you show the error that you got and tell me the parameters, then I could try to replicate the problem.

Best regards

Options: ReplyQuote
Re: Issues in using SPMF
Posted by: Shivani
Date: April 20, 2021 02:03PM

I set the command as java -jar spmf.jar run HUIM-ABC food_mart.txt a1.txt 27000 for this http://www.philippe-fournier-viger.com/spmf/datasets/foodmart.txt dataset for HUIM-ABC algorithm but I could not get any output even after 45+ minutes, then I had to force stop.

Options: ReplyQuote
Re: Issues in using SPMF
Date: April 20, 2021 04:36PM

Hi,

I see. It is just that the algorithm is slow.. Generally, when the minutil value is set to a small value, the algorithms will become slower and when minutil is set to a high value, it is faster.

For example, on that dataset, if i use a higher value, the algorithm will terminate. But if the value is too low, it will take a long time to terminate.

There is not much do to about this... If the algorithm is too slow, you can use another algorithm.

For example, with HUIM-BPSO-tree, the algorithm took about 25 second for minutil = 40000 on Foodmart

But if you use some exact algorithm like EFIM, it will only take 45 ms...

So it depends on the dataset: how many items, how many lines, how long each line is, do the lines are similar to each other ?

It seems that for Foodmart, it is just better to use an exact algorithm than an approximate algorithm.

Best regards,

Options: ReplyQuote
Re: Issues in using SPMF
Posted by: Shivani
Date: April 21, 2021 08:56AM

I cannot change the algorithms as those are part of my study.

I've tried with other datsets too but still I am not getting any output for HUIM-ABC algorithm. However, I've got the outputs for HUIM-BPSO and HUIM-GA algorithms using chess dataset. I guess something is wrong with the implementation of HUIM-ABC algorithm.

Options: ReplyQuote
Re: Issues in using SPMF
Date: April 26, 2021 11:12PM

Hi,

Sorry to reply late. I was busy and did not check the forum for a few days.

I did not check the code but this is the original implementation of HUIM-ABC. Thus, this is how HUIM-ABC is. But it is still possible that something can be optimized. If you find some optimizations or some bug, you may let me know. Or you can also try contacting the authors of HUIM-ABC if you have some questions about it.

Best regards,

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.