The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Mining Frequent Itemsets from Secondary Memory
Posted by: Marcel Schulze
Date: November 03, 2013 04:27AM

Hello,

I would to ask how I should implement a frequent itemset mining algorithm using secondary memory (disk) or disk partitioning for very large databases which cannot fit in main memory.

Have you implemented any algorithms that could find frequent itemsets within limited main memory in Java?

Thank you very much for your great open-source data mining software.

Best Regards,
Marcel Schulze

Options: ReplyQuote
Re: Mining Frequent Itemsets from Secondary Memory
Date: November 03, 2013 06:26AM

Hi,

Thanks. I'm glad that the software is useful.

I have not implemented any such algorithms. These paper seems to be relevant:

Mining Top-K Frequent Patterns in the Presence of the Memory Contraint
http://arbor.ee.ntu.edu.tw/~doug/paper/PPL/mtk.pdf

Mining Frequent Itemsets from Secondary Memory (based on FPGrowth)
http://users.encs.concordia.ca/~grahne/papers/icdm04.pdf

Best,

Philippe



Edited 4 time(s). Last edit at 11/03/2013 06:29AM by webmasterphilfv.

Options: ReplyQuote
Re: Mining Frequent Itemsets from Secondary Memory
Posted by: Marcel Schulze
Date: November 03, 2013 08:28AM

Hi again,

Thanks for your quick response.

I read the second paper before. I also took a look at the first paper. Those are exactly what I'm looking for. However, I didn't understand the partitioning and merging method. Therefore, I didn't find a way to implement those algorithms.

For instance, if we have a database with 100,000,000 transactions, can we simply divide it into 1000 databases each with 100,000 transactions? If so, how can we merge the processed partitions (sub-databases) to generate frequent itemsets?

I have also found two similar algorithms. The first one is SaM (Split and Merge) by Dr. Borgelt and the second is A-Priori using divide and conquer method. But I have the same problem with these two.

I would appreciate your support and guidance in this issue.

Best Regards,
Marcel Schulze

Options: ReplyQuote
Re: Mining Frequent Itemsets from Secondary Memory
Date: November 03, 2013 09:57AM

Hi,

Unfortunately, I have not read these papers and I am currently very busy and do not have much free time to read articles not related to my current projects. So I cannot help you much about explaining how these algorithms work.

Best,

Philippe

Options: ReplyQuote
Re: Mining Frequent Itemsets from Secondary Memory
Posted by: Silva
Date: November 09, 2013 11:56AM

This paper may give other information:

DRFP-tree: disk-resident frequent pattern tree

Silva

Options: ReplyQuote
Re: Mining Frequent Itemsets from Secondary Memory
Posted by: Marcel Schulze
Date: November 11, 2013 05:46AM

Hi,

Thank you Silva. I will take a look.

Regards,
Marcel

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.