The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
what is the advent to use in the input file numbers for items !!
Posted by: azerty1
Date: November 07, 2019 12:37AM

Hello,

I want to know what is the interest to use in SPMF the coding of the items in the file to enter !! because according to SPMF we have to replace all items with numbers !!

Options: ReplyQuote
Re: what is the advent to use in the input file numbers for items !!
Date: November 11, 2019 06:55PM

Hi,

The interest of using numbers is for efficiency. Most of the algorithms in SPMF use integer to represent items internally, because it is faster to compare integers than to compare strings, and integers require less memory than strings. For example, if you want to compare two integers 12 =? 13 it requires only one CPU instruction, while if you want to compare two strings such as "banana" and "banana juice" you need to compare many characters. Moreover, 12 requires maybe 32 or 64 bits on your computer memory, while "banana" maybe requires 7 x 32 or 64 bits, depending on the representation. So this is the reason for using integer to represent items internally.

Now, in the input files, you can use integers, or as explained in the documentation you can also define names for items, for most algorithms. For example, if you look at the documentation of Apriori, you can see that you can use this format:

@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
1 3 4
2 3 5
1 2 3 5
2 5
1 2 3 5

This format defines that the item 1 is equal to "apple". Also you can use the ARFF format too with SPMF. These formats will work with the user interface or command line of SPMF. If you want to use them with the source code version of SPMF, it would be possible but maybe I would need to explain to you how to do it.

Thanks for using SPMF. Best regards.

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.