Re: Properties of Retail and Kosarak datasets?
Date: May 22, 2016 06:36PM
Statistics for the original Kosarak dataset are:
Number of transactions: approximately 990,000
Number of distinct items: 41270
Largest item id: 41270
Average transaction length : 8.09997151520906 standard deviation: 23.624398349313864 variance: 558.1121973670637
Statistics for Retail are:
Number of transactions : 88162
Number of distinct items: 16470
Smallest item id: 0
Largest item id: 16469
Average transaction length : 10.305755314080896 standard deviation: 8.162337991148334 variance: 66.62376148174341
Average item support in the database: 55.165513054037646 standard deviation: 568.0639597302245 variance: 322696.66234438214 min value: 1 max value: 50675
For the average size of the maximally potentially large itemsets, I do not have that information, and you would need to define what is a "maximally potentially large itemset". Then you could write some program to calculate that.
Edited 6 time(s). Last edit at 05/22/2016 06:50PM by webmasterphilfv.