The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Input File conversion
Posted by: Al Mahdi
Date: November 16, 2018 03:42PM

Hello everyone,

I have a dataset that is similar to that one in the image. (http://docsdrive.com/images/academicjournals/jm/2011/tab5-2k11-510-523.gif) I have 200 bacteria samples that are tested against 10 different antibiotics. 'R' are those who resisted to that antibiotics, while 'S' were sensible to it.

My goal is to mine infrequent association rules using AprioriInverse, but first I need to make my data suitable for SPMF.

From the documentation, it seems like I need to have my data in a kind transactional database. If I have the file, say, in a .csv file, what steps should I follow to make it acceptable for SPMF?

I have already used Apriori in other libraries, but obtained only high-confidence rules, while my interest is in rare associations. This is one example.

Cefepime=S Imipeneme=S Ertapenem=S Norfloxacin=S 311 ==> Ceftriaxone=S Amikacin=S 302 <confsad smiley0.97)> liftsad smiley1.22) levsad smiley0.11) [54] convsad smiley6.31)



ANNY INPUT ON THIS IS APPRECIATED.

Options: ReplyQuote
Re: Input File conversion
Posted by: Al Mahdi
Date: November 17, 2018 02:00AM

TID A B C D E
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0

My dataset is indeed a transaction database..

Options: ReplyQuote
Re: Input File conversion
Date: November 17, 2018 04:14AM

So basically, you have items and transactions, so you just need to convert it to SPMF format.

For example, a database like this:

TID A B C D E
T1 1 1 1 0 0
T2 1 1 1 1 1
T3 1 0 1 1 0
T4 1 0 1 1 1
T5 1 1 1 1 0

would look like this in SPMF format:

1 2 3
1 2 3 4 5
1 3 4
1 3 4 5
1 2 3 4

where 1 = A, 2 = B, 3 = C, 4 = D, 5 = E and where each line is a transaction.

The format is not very complicated. You can write a short program of maybe 10-20 lines of code in any language you like to convert your data to the appropriate format. Then, you can apply AprioriInverse or other algorithms offered in SPMF. I think you can try different algorithms. Even Indirect Association Rules may be interesting for your problem, perhaps or other.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.