Hi All,
In addition to two popular small graph datasets for frequent subgraph mining, namely Chemical and Compound datasets, I have uploaded nine new large graph datasets in the following link:
https://github.com/nphdang/gSpan/tree/master/DataThese datasets include seven bio- and chemo-informatics datasets and two social network datasets. They are in the format of DIMACS, the default format used in the gSpan algorithm.
In case you need the detail of the graph format, you can read the post of Philippe at
this link.
If you use these graph datasets in your papers or projects, please cite our paper as follows:
Dang Nguyen, Wei Luo, Tu Dinh Nguyen, Svetha Venkatesh, Dinh Phung (2018). Learning Graph Representation via Frequent Subgraphs. SDM 2018, San Diego, USA. SIAM, 306-314.Cheers,
Dang Nguyen