Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

The Data Mining Forum

open-source data mining software

data mining conferences

Data Science for Social and Behavioral Analytics DSSBA 2022

data science journal

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: Vu Duc Toan

Date: June 09, 2014 02:32AM

I have read link http://www.philippe-fournier-viger.com/spmf/index.php?link=performance.php with question 2012-11-13 Which frequent itemset mining algorithm is the most efficient ?

I have seen "FPGrowth seems to be clearly the best algorithm both in terms of execution time and memory usage"
but FPGrowth is the old algorithm (2000). Is there a new algorithm (2013,2014) not only better than FPGrowth about execution time but also easy implement?
Thanks

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: webmasterphilfv

Date: June 09, 2014 02:48AM

In general, FPGrowth based algorithms are the best for frequent itemset mining.

FPGrowth is not very hard to implement and give very good performance.

Although FPGrowth was published in 2000, note that there are still many optmizations of FPGrowth that are published and are reported to be faster.

For example, LPGrowth is very fast and memory efficient and was published in 2014:

Efficient frequent pattern mining based on linear prefix tree
G Pyun, U Yun, KH Ryu - Knowledge-Based Systems (2014), Knowledge-Based Systems
Volume 55, January 2014, Pages 125–139

It was compared with many other FPGrowth implementations and outperforms them.

Other very fast algorithms are LCM.

Easy to implement? If you want something fast, I don't think that you will find an algorithm easier to implement than FPGrowth. Moreover, if you want something fast, you will need to spend time to optimize it. Because, if you do not implement an algorithm well, the performance may not be good, even if the algorithm is good.

By the way, next week, I will release an updated version of SPMF. I am currently working on optimizing some algorithms. After optimizing my ECLAT implementation yesterday, I see that it is sometimes faster than my FPGrowth implementation.

Edited 4 time(s). Last edit at 06/09/2014 02:53AM by webmasterphilfv.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: Vu Duc Toan

Date: June 16, 2014 11:40PM

Dear Prof webmasterphilfv
How can I get the source code from article "Efficient frequent pattern mining based on linear prefix tree"? In section "Performance evaluation", I saw it outforms FPGrowth
Thanks

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: webmasterphilfv

Date: June 17, 2014 03:42AM

If you want C++ source code, you may contact the authors of that paper. Perhaps that they would share it or not.

Otherwise, you may implement it if you really need it.

But do you really need something as fast?

If you don't really need something very fast, then you may use something else too.

Edited 1 time(s). Last edit at 06/17/2014 03:43AM by webmasterphilfv.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: sinthu

Date: May 30, 2016 11:24PM

Vu Duc Toan Wrote:
-------------------------------------------------------
> Dear Prof webmasterphilfv
> How can I get the source code from article
> "Efficient frequent pattern mining based on linear
> prefix tree"? In section "Performance evaluation",
> I saw it outforms FPGrowth
> Thanks

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: webmasterphilfv

Date: May 30, 2016 11:34PM

I did not implement that algorithm. So it is not offered in SPMF. If someone implement it and wants to share the code, it could be added to SPMF.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: khairy

Date: September 14, 2014 06:55AM

Dear webmasterphilfv

As we know Apriori and FP-growth is well-known algorithms which is cited by the most recently published papers even 2014. but if we asked during our viva voce about why we used Apriori and FP-Growth for the comparison although they are implemented in 1993 and 2000 respectively. Thanks

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: webmasterphilfv

Date: September 15, 2014 05:39AM

Hi,

I would say that although they are old algorithms, you chose them because they probably remain the most popular algorithms for itemset mining. They are the algorithms that are explained in many data mining books and implemented in most data mining tools. I think that would be the main answer. You can acknowledge the fact that there exists some newer algorithm but you can say that the fastest probably are extensions of FPGrowth.

Best,

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: denny

Date: November 15, 2014 05:38AM

Please try Prepost and FIN. They are two new algorithms proposed in recent years, and may be the good choice for you.

Prepost(http://info.scichina.com:8084/sciFe/EN/abstract/abstract508369.shtml)
Source Code's website:
http://www.cis.pku.edu.cn/faculty/system/dengzhihong/Source%20Code/prepost.cpp

FIN(http://www.sciencedirect.com/science/article/pii/S0957417414000463)
Source Code's website: http://www.cis.pku.edu.cn/faculty/system/dengzhihong/Source%20Code/fin.cpp

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: LAK

Date: November 15, 2014 10:25AM

I compiled that first code and ran it with this data and ran it with the parameters: input.txt 0.4 1

where input.txt is:

1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

and output seems incorrect:

5 (1 1) 2
1 (2 2)
1 3 (2 2)
1 3 2 (1 1)
1 2 (1 1)
3 (3 2)
3 2 (2 1)
2 (3 1)

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: denny

Date: November 16, 2014 11:19PM

Please note:
for each transaction, the last number should follow a blank space.
That is, each transaction ends with a blank space.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: LAK

Date: November 17, 2014 06:44PM

I add the empty space at the end of each line and an extra empty line at the end of the file. But the output is incomplete.

parameters: input.txt 0.3 1

4 (1 1) 1 3
1 (3 2)
1 5 (2 1) 3 2
1 3 (3 2)
1 3 2 (2 1)
1 2 (2 1)
5 (4 2) 2
5 3 (3 1) 2
3 (4 2)
3 2 (3 1)
2 (4 1)

Support is ok. But misses several itemsets such as {1 3 5}, {2 3 5} and {1 2 3 5}.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: ZX

Date: November 18, 2014 06:37AM

1 5 (2 1) 3 2 is {1 2 3 5}
5 3 (3 1) 2 is {2 3 5} and so on
5 3 (3 1) 2 means when 5 and 3 co-occer, 2 must appear.
when minsup = 0.4, my outputs are:
1 (3 2)
1 5 (2 1) 3 2
1 3 (3 2)
1 3 2 (2 1)
1 2 (2 1)
5 (4 2) 2
5 3 (3 1) 2
3 (4 2)
3 2 (3 1)
2 (4 1)
They are different from yours while we use the same code and this condition also exists when minsup = 0.1.
I think your input may be wrong.

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: Attiya

Date: June 11, 2016 10:11AM

Hello

I want to find the association between two objects.. What is the new algorithm of finding term frequencies as i know tf-idf. and secondly the new and efficient algorithm for association rule mining.

Thanks

Options: Reply•Quote

Re: which is the new & efficient Algorithm for Mining Frequent Itemsets?

Posted by: webmasterphilfv

Date: June 11, 2016 10:21AM

For TF-IDF i don't know. It is probably simple to calculate it by writing a simple program.

For association rule mining, there exists many algorithms. It depends what you want to do. For example, there are some algorithms for finding rare rules, correlated patterns, etc. Besides, the basic association rule mining algorithms do not consider the order of words. You could also consider sequential pattern mining and sequential rule mining algorithms. Sequential rules are like association rules but they will consider the order of words in sentence for example, which may be more interesting for text mining, perhaps. You can try many of these algorithms in the SPMF library (link on top of this page, with many examples in the documentation)

Edited 1 time(s). Last edit at 02/08/2017 06:34PM by webmasterphilfv.

Options: Reply•Quote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.