Properties of Retail and Kosarak datasets?

The Data Mining Forum

open-source data mining software

data mining conferences

Data Science for Social and Behavioral Analytics DSSBA 2022

data science journal

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Properties of Retail and Kosarak datasets?

Posted by: maya

Date: May 22, 2016 06:00PM

would you please tell me the average size of the transactions and the average size of the maximally potentially large itemsets for the two datasets (Retail and Kosarak) which are taken from FIMI repository.

Regards
Maya

Options: Reply•Quote

Re: Properties of Retail and Kosarak datasets?

Posted by: webmasterphilfv

Date: May 22, 2016 06:36PM

Statistics for the original Kosarak dataset are:

Number of transactions: approximately 990,000
Number of distinct items: 41270
Largest item id: 41270
Average transaction length : 8.09997151520906 standard deviation: 23.624398349313864 variance: 558.1121973670637

Statistics for Retail are:

Number of transactions : 88162
Number of distinct items: 16470
Smallest item id: 0
Largest item id: 16469
Average transaction length : 10.305755314080896 standard deviation: 8.162337991148334 variance: 66.62376148174341
Average item support in the database: 55.165513054037646 standard deviation: 568.0639597302245 variance: 322696.66234438214 min value: 1 max value: 50675

For the average size of the maximally potentially large itemsets, I do not have that information, and you would need to define what is a "maximally potentially large itemset". Then you could write some program to calculate that.

Edited 6 time(s). Last edit at 05/22/2016 06:50PM by webmasterphilfv.

Options: Reply•Quote

Re: Properties of Retail and Kosarak datasets?

Posted by: maya

Date: May 22, 2016 06:52PM

Thank you very much for your reply.

Options: Reply•Quote

code for creating node in parallel mining

Posted by: Sandhya

Date: October 28, 2017 10:16PM

Hello,
I want code for creating node as well clusters using parallel mining FP growth algorithm on Hadoop platform

Options: Reply•Quote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.