Mining Frequent Itemsets from Secondary Memory

The Data Mining Forum

open-source data mining software

data mining conferences

Data Science for Social and Behavioral Analytics DSSBA 2022

data science journal

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Mining Frequent Itemsets from Secondary Memory

Posted by: Marcel Schulze

Date: November 03, 2013 04:27AM

Hello,

I would to ask how I should implement a frequent itemset mining algorithm using secondary memory (disk) or disk partitioning for very large databases which cannot fit in main memory.

Have you implemented any algorithms that could find frequent itemsets within limited main memory in Java?

Thank you very much for your great open-source data mining software.

Best Regards,
Marcel Schulze

Options: Reply•Quote

Re: Mining Frequent Itemsets from Secondary Memory

Posted by: webmasterphilfv

Date: November 03, 2013 06:26AM

Hi,

Thanks. I'm glad that the software is useful.

I have not implemented any such algorithms. These paper seems to be relevant:

Mining Top-K Frequent Patterns in the Presence of the Memory Contraint
http://arbor.ee.ntu.edu.tw/~doug/paper/PPL/mtk.pdf

Mining Frequent Itemsets from Secondary Memory (based on FPGrowth)
http://users.encs.concordia.ca/~grahne/papers/icdm04.pdf

Best,

Philippe

Edited 4 time(s). Last edit at 11/03/2013 06:29AM by webmasterphilfv.

Options: Reply•Quote

Re: Mining Frequent Itemsets from Secondary Memory

Posted by: Marcel Schulze

Date: November 03, 2013 08:28AM

Hi again,

Thanks for your quick response.

I read the second paper before. I also took a look at the first paper. Those are exactly what I'm looking for. However, I didn't understand the partitioning and merging method. Therefore, I didn't find a way to implement those algorithms.

For instance, if we have a database with 100,000,000 transactions, can we simply divide it into 1000 databases each with 100,000 transactions? If so, how can we merge the processed partitions (sub-databases) to generate frequent itemsets?

I have also found two similar algorithms. The first one is SaM (Split and Merge) by Dr. Borgelt and the second is A-Priori using divide and conquer method. But I have the same problem with these two.

I would appreciate your support and guidance in this issue.

Best Regards,
Marcel Schulze

Options: Reply•Quote

Re: Mining Frequent Itemsets from Secondary Memory

Posted by: webmasterphilfv

Date: November 03, 2013 09:57AM

Hi,

Unfortunately, I have not read these papers and I am currently very busy and do not have much free time to read articles not related to my current projects. So I cannot help you much about explaining how these algorithms work.

Best,

Philippe

Options: Reply•Quote

Re: Mining Frequent Itemsets from Secondary Memory

Posted by: Silva

Date: November 09, 2013 11:56AM

This paper may give other information:

DRFP-tree: disk-resident frequent pattern tree

Silva

Options: Reply•Quote

Re: Mining Frequent Itemsets from Secondary Memory

Posted by: Marcel Schulze

Date: November 11, 2013 05:46AM

Hi,

Thank you Silva. I will take a look.

Regards,
Marcel

Options: Reply•Quote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.