Graph datasets for frequent subgraph mining

Posted by: Dang Nguyen

Date: May 28, 2018 09:49PM

Hi All,

In addition to two popular small graph datasets for frequent subgraph mining, namely Chemical and Compound datasets, I have uploaded nine new large graph datasets in the following link:
https://github.com/nphdang/gSpan/tree/master/Data

These datasets include seven bio- and chemo-informatics datasets and two social network datasets. They are in the format of DIMACS, the default format used in the gSpan algorithm.

In case you need the detail of the graph format, you can read the post of Philippe at this link.

If you use these graph datasets in your papers or projects, please cite our paper as follows:
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung (2018). Learning Graph Representation via Frequent Subgraphs. SDM 2018, San Diego, USA. SIAM, 306-314.

Cheers,
Dang Nguyen

Re: Graph datasets for frequent subgraph mining

Posted by: webmasterphilfv

Date: May 29, 2018 04:40AM

Dear Dang,

It is great that you share this! I think it will be useful to many.

Thanks,

Philippe

Re: Graph datasets for frequent subgraph mining

Posted by: Dang Nguyen

Date: May 29, 2018 02:57PM

Yes, I think these datasets are very useful for people who are working on frequent subgraph mining and related topics.

From now on, one can test their algorithm on a diversity of datasets. Three years ago, my paper got rejected since it only compared algorithms on two datasets Chemical and Compound.

The characteristics of these datasets can be found in the paper:
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung (2018). Learning Graph Representation via Frequent Subgraphs. SDM 2018, San Diego, USA. SIAM, 306-314.

Cheers,
Dang Nguyen