The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
help with the ECLAT algorithm: how to calculate tid sets and print them
Date: March 27, 2014 08:59AM

Today, I have received the following questin in my e-mail:

" i am still confused because in ECLAT algorithm if we want to generate 2-itemsets from 1-itemsets, intersection of 1-itemsets is required and i don't know how to intersect them. If we want to print Transaction-id with itemsets then what should we do."

I will give the answer below.



Edited 2 time(s). Last edit at 03/27/2014 09:02AM by webmasterphilfv.

Options: ReplyQuote
Re: about the ECLAT algorithm
Date: March 27, 2014 08:59AM

Let's say that you have a set of three transactions:

Transaction 1 : 1 2 4
Transaction 2: 1 4
Transaction 3: 1 2 5

Eclat will transform the database in a vertical database:

Item 1 : transaction1, transaction 2 transaction 3
item 2: transaction 1 transaction 3
item 4 transaction 1 transaction 2
item 5: transaction 3


If you want to combine single items to generate an itemset of size 2, then you need to perform the intersection of the tid sets.

For example, if you have 1-itemsets {item 1} and {item 2} and you want to calculate the tid set of the 2-itemset {item1, item 2}, then:

the intersection of

transaction1, transaction 2 transaction 3

and

transaction 1 transaction 3

is

transaction 1 transaction 3


Thus, the tid set of the 2-itemset {item1, item 2} is { transaction 1 transaction 3}

For the second question, if you want to print the tidsets with my source code if you are using the version of Eclat to save to file, then you would modify the method save() in the file AlgoEclat.Java

and you would replace

writer.write(node.getItemset().toString() + " #SUP: " + node.getTidset().size());

by this:

writer.write(node.getItemset().toString() + " #SUP: " + node.getTidset().size() + " #TIDSET" + node.getTidset());

Options: ReplyQuote
Re: about the ECLAT algorithm
Posted by: manthan
Date: March 30, 2014 03:04AM

i want to know if i have words instead of items, then how can i print word patterns instead of itemsets with transaction ids.

Options: ReplyQuote
Re: about the ECLAT algorithm
Date: March 30, 2014 03:09AM

In SPMF, items are integers.

If you want to use words instead of integers, you have the following options:
1) modify the source code so that the algorithm uses String instead of Integer for an item. In this case, you need to be careful that you replace every == comparison by equal() because a String should in general not be compared with another String using ==.
2) If you don't want to modify the source, another option is to use the ARFF format as input. So the idea is to convert your file to ARFF and then to give it to the algorithm. There is an example in the documentation that shows how to use the ARFF format. The ARFF format represents items as String. So if you use ARFF, it would work and you don't need to modify the source code of SPMF. But you need to read the documentation carefully about how to use ARFF.
3) Another option is to write your own code to do some pre-processing and post-processing to convert from strings to integers and then to convert the result from integers back to strings.



Edited 2 time(s). Last edit at 03/30/2014 03:11AM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.