The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
AprioriRare Input Data Format
Posted by: Iris
Date: February 14, 2014 01:05PM

Hi Phillip

I'm trying to use the minimal rare itemsets algorithm, I prep my data like this:

C98940 C98941 P97012 P97010 P97140 P97014 P97110
C98941 P97112 D99204 D99214 P97110
C98941 D99213 M99070 P97012 MJ1885 P97014 P97140 D99214 DMEE0720

each row is a transaction.

when I try to run this, I got an error message:

java.lang.NumberFormatException: For input string :"C98940"


Does this algorithm require each item to be a number? Is there a way to get around this?

Thanks,

Iris

Options: ReplyQuote
Re: AprioriRare Input Data Format
Date: February 14, 2014 04:14PM

Hi,

Yes, the default format is to use numbers. The reason is for efficiency.

However, SPMF also support an alternative which is to use the ARFF format which allows to use strings.

The ARFF format looks like that:

@relation LCCvsLCSH
 
    @attribute LCC string
    @attribute LCSH string
 
    @data
    AG5,   'Encyclopedias and dictionaries.;Twentieth century.'
    AS262, 'Science -- Soviet Union -- History.'
    AE5,   'Encyclopedias and dictionaries.'
    AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Phases.'
    AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Tables.'

For example, this dataset has 5 transactions with two attributes, which are strings.

If you want some sample datasets in ARFF format, there are also a few on the "dataset" page of the SPMF website and a link to the formal ARFF specification..

If you use the graphical user interface of SPMF, the ARFF format is supported for all itemset and association rule mining algorithms and you don't need to do anything special. The graphical interface will detect that a file is in ARFF format.

If you want to use the ARFF format in the source code version of SPMF, you would need to look at example #62 in the documentation for more information.

Best,

Philippe

Options: ReplyQuote
Re: AprioriRare Input Data Format
Posted by: Pooja jardosh
Date: December 22, 2014 02:27AM

20 156 161 189 207 209 230 239 271 279 282 285 288 294 300 310 315 320 345 383
20 156 161 189 207 209 230 239 271 279 282 285 288 294 300 310 315 320 345 383
20 156 160 189 203 210 220 239 270 278 282 285 288 292 302 310 315 320 332 385
20 156 159 189 201 211 213 246 269 276 282 285 288 293 300 310 315 320 321 382
20 156 159 189 201 211 213 246 269 276 282 285 288 293 300 310 315 320 321 382
20 156 162 189 204 209 217 239 270 278 282 285 288 295 300 310 315 320 329 384
20 156 162 189 204 209 217 239 270 278 282 285 288 295 300 310 315 320 329 384
20 156 162 189 204 209 217 239 270 278 282 285 288 295 300 310 315 320 329 384
20 156 162 189 204 209 217 239 270 278 282 285 288 295 300 310 315 320 329 384

Suppose this is my dataset.Each row is transaction.
How can i convert this tabed Separated file to arff file??

Please reply

Options: ReplyQuote
Re: AprioriRare Input Data Format
Date: December 22, 2014 03:54AM

Your file would work fine with SPMF as it is.

Now to answer your question, I don't have any source code that would convert this format to ARFF.

You could write your own code to do that by reading the ARFF file format specification:
http://weka.wikispaces.com/ARFF+%28stable+version%29

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.