The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Properties of Retail and Kosarak datasets?
Posted by: maya
Date: May 22, 2016 06:00PM

would you please tell me the average size of the transactions and the average size of the maximally potentially large itemsets for the two datasets (Retail and Kosarak) which are taken from FIMI repository.

Regards
Maya

Options: ReplyQuote
Re: Properties of Retail and Kosarak datasets?
Date: May 22, 2016 06:36PM

Statistics for the original Kosarak dataset are:

Number of transactions: approximately 990,000
Number of distinct items: 41270
Largest item id: 41270
Average transaction length : 8.09997151520906 standard deviation: 23.624398349313864 variance: 558.1121973670637

Statistics for Retail are:

Number of transactions : 88162
Number of distinct items: 16470
Smallest item id: 0
Largest item id: 16469
Average transaction length : 10.305755314080896 standard deviation: 8.162337991148334 variance: 66.62376148174341
Average item support in the database: 55.165513054037646 standard deviation: 568.0639597302245 variance: 322696.66234438214 min value: 1 max value: 50675


For the average size of the maximally potentially large itemsets, I do not have that information, and you would need to define what is a "maximally potentially large itemset". Then you could write some program to calculate that.



Edited 6 time(s). Last edit at 05/22/2016 06:50PM by webmasterphilfv.

Options: ReplyQuote
Re: Properties of Retail and Kosarak datasets?
Posted by: maya
Date: May 22, 2016 06:52PM

Thank you very much for your reply.

Options: ReplyQuote
code for creating node in parallel mining
Posted by: Sandhya
Date: October 28, 2017 10:16PM

Hello,
I want code for creating node as well clusters using parallel mining FP growth algorithm on Hadoop platform

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.