Re: What are best sampling methods for top-k Frequent Subgraph Mining
Date: April 05, 2020 04:28PM
Hi,
I think that there are a few ways of doing samplings depending on what you want to do.
1) You could do sampling on the input graphs to reduce the size of input database. For example, an input database has 300 graphs, but you decide to do sampling and use only 200 of those.
2) You could do sampling to reduce the search space during the search for frequent subgraphs. Instead of checking all subgraphs in the search space, you will skip some of them. But then, the result would become approximate.
3) Or among the patterns that you have found, you may sample them to get a small set that is representative of all the patterns.
So I think first, you should think about what is your goal about using sampling. Then, if it is clear, you can compare the sampling techniques for either of these goals.
I dont remember the details of the different sampling techniques so I cannot help you much. But I think some of them may be covered in some data mining textbooks or otherwise in some research papers.
I assume that using sampling, a new approach would be a success if it can considerably reduce the runtime but at the same time give you some result that is quite accurate. And perhaps that you also want to control that error, or guarantee the maximum error if you can...
By the way, you can start from the source code of TKG in SPMF and perhaps modify it for your project. It does the Top-k sugraph mining.
Hope this gives you some ideas.
Best regards,
Philippe