The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
LAPIN Algorithm for sequential pattern mining
Posted by: Ashish kumar rai
Date: June 25, 2014 04:26AM


I came to know about LAPIN implementation. Is this algorithm is faster than prefixspan, spade, spam, gsp.


Edited 1 time(s). Last edit at 06/25/2014 06:42AM by webmasterphilfv.

Options: ReplyQuote
Re: Lapin Algorithm
Date: June 25, 2014 05:15AM

In the research papers of LAPIN, it claims to be faster than PrefixSpan and SPAM.

However, in my Java implementation in the SPMF Java Data Mining library, it is not always faster. Actually, I have done a few experiments and it seems to have comparable speed to SPAM and faster than PrefixSpan. In SPMF, the algorithm that is clearly the fastest is CM-SPADE (it is much faster than LAPIN.

Some comparison of CM-SPADE with other algorithms can be found here:

Note that LAPIN has not been included yet in this comparison but I say that based on some quick experiments that I have done to compare the performance.

Edited 1 time(s). Last edit at 06/25/2014 05:16AM by webmasterphilfv.

Options: ReplyQuote
Re: Lapin Algorithm
Posted by: Ashish
Date: June 25, 2014 06:16AM

some optimization can be done to make it faster according to documentation given in the spmf. Is the optimize is already done?

Options: ReplyQuote
Re: Lapin Algorithm
Date: June 25, 2014 06:34AM

A few days ago, I have added one or two optimizations to reduce the memory usage.

Overall, I have did my best to optimize it and I have tried to follow the paper as accurately as possible. But there may be some other optimizations that I have not thought about or that are not described in the paper.

Note that in the original LAPIN algorithm in C++, their implementation has some limitations. For example, it cannot handle sequences containing more than 64 items, if I remember well. The reason for this is probably to add some additional optimizations. However, in SPMF, I don't want to put any such constraints because I think that the user should be able to have as many items as he wants in its input database, and to make fair comparisons with other algorithms that do not have such implementation constraint.

Overall, what is good in LAPIN is that it tracks when is the last position of items in each sequence to avoid considering items that appear after the current position. However, this kind of strategy probably works well only for very long sequences. Besides, a limitation of LAPIN is that it has to explore a very large search space, and may consider many infrequent patterns or patterns not appearing in the database. The reason is that it is based on SPAM. Just like SPAM, LAPIN grows patterns by appending items from a list of items that may or may not appear after a pattern (items not appearing are then removed by last position during support counting but this is costly). Another limitation of LAPIN is that it is a very difficult algorithm to implement (much harder than algorithms like SPAM or SPADE).

In CM-SPADE/CM-SPAM (fastests algorithms in SPMF), we have handled the problem of generating too many candidates by using co-occurrences information between items. This is the reason why CM-SPADE/CM-SPAM are so fast. These techniques could probably also be combined with LAPIN. But it would not be the original LAPIN anymore (we should call it CM-LAPIN) and I have doubts that it could beat CM-SPADE which is quite fast.

Edited 6 time(s). Last edit at 06/25/2014 06:40AM by webmasterphilfv.

Options: ReplyQuote
Re: LAPIN Algorithm for sequential pattern mining
Posted by: Ashish kumar rai
Date: June 25, 2014 07:16AM

Thank You Sir for guiding me about lapin.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.