Data set for TRuleGrowth algorithm (sequential rule mining)

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: happy

Date: April 27, 2013 08:54PM

Hi,

I have a dataset which is of the following type
1 2 3 -1 -2
1 4 3 2 3 -1 -2
5 6 1 2 -1 -2
3 2 3 -1 -2
according to me each row of the dataset containes items that occur together rather than one after the other, and each row is a sequence (correct me if I am wrong). If I am right then why is it not giving any output with TRuleGrowth algorithm.

Thanks

Edited 1 time(s). Last edit at 02/11/2014 04:28AM by webmasterphilfv.

Options: Reply•Quote

Re: Data set

Posted by: Happy

Date: April 27, 2013 10:36PM

Hi,

I am not able to see the views posted, can any one guide me on this....

I am sorry if the procedure is very simple and still iam not able to go through it.

Thanks

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: April 28, 2013 06:37AM

Hi Happy,

The input of TRuleGrowth is a sequence database, where each line is a sequence.

If all items appear at the same time in each sequence, TRuleGrowth will generate no result because it is looking for rules of the form X ==> Y, where X and Y are groups of items such that X happen before Y. TRuleGrowth try to find rules that are common to several sequences.

If all the items appear simultaneously in each line of your dataset, then it is not a sequence database. You could consider it as a transaction database (where there is no time) and apply an association rule mining algorithm instead. An association rule mining algorithm will generate rules too, but there will be no time.

Also, another important thing is that all these algorithms generally assume that items that appear together are sorted by some order to use some pruning strategies. If the items are not sorted, the algorithms may not generate the correct result. Also, an item should not appear twice in the same itemset. So for example, you could sort all your lines according to the increasing order of items and remove items that appear twice to get that:

1 2 3 -1 -2
1 2 3 4 -1 -2
1 2 5 6 -1 -2
2 3 -1 -2

Now if you want to apply an association rule mining algorithm in SPMF, you would need to also remove the -1 and -2, like that:

1 2 3
1 2 3 4
1 2 5 6
2 3

Hope this helps,

Philippe

Edited 2 time(s). Last edit at 04/28/2013 06:40AM by webmasterphilfv.

Options: Reply•Quote

Re: Data set

Posted by: Happy

Date: May 03, 2013 03:32AM

Hi,

I am trying to find association rules in multidimensional non sequential data related to time and location. I saw this SeqDIM implementation in your site, when i used it with the data as follows(weather data)

300 18 -3 406 710 511 315 605 -1 -2
300 23 -3 406 710 511 314 605 -1 -2
300 33 -3 406 710 511 314 605 -1 -2
300 34 -3 406 708 512 313 605 -1 -2

the second part is non sequential as you mentioned.
it is giving me intresting patterns.... is it ok to use the algorithm with this case or will it give incorrect results.

Thanks

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: May 04, 2013 05:52AM

Hi,

If the results are interesting for your application, I think that it could make sense.

What you will get are patterns that frequently appear in your data.

It would not be like association rules because an association rule has two measures: support (frequency) and the confidence (a probability). With sequential patterns, you do not have the confidence. If you don't need the confidence for your application, then it is ok, according to what you have described to me.

Best,

Philippe

Options: Reply•Quote

Re: Data set

Posted by: Vathsala

Date: June 24, 2013 04:09AM

Hi,

for the data shown in my previous post,
with multidimentional sequential pattern mining algorithm I am getting a frequent Pattern as [ 300 23 ]{t=0, 406 512 605 } supp:4
can i read these patterns into MainTestAllAssociationRules_FPGrowth_version algorithm for fetching association rules with some modifications as required.
Or can you give me any other idea for deriving association rules not sequential rules from such patterns

Thanks

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: June 24, 2013 07:38AM

Hi,

If i understand you find to find multi-dimensional association rules.

Association rules are generaly created in two steps : (1) generating frequent itemsets and then (2) using the itemsets to generate association rules.

The multidimensional patterns that you have found can be considered as itemsets. So it would be possible to generate the association rules but it would require some programming.

Let's say that you have found three patterns:

[ 300 23 ]{t=0, 512 605 } supp:4

[ 300 23 ]{t=0, 406 512 605 } supp:4

[ 300 5 ]{t=0, 407 512 605 } supp:4

You would need to put the patterns that have the same dimension values together. For example, these two patterns have the same dimension values:

[ 300 23 ]{t=0, 512 605 } supp:4

[ 300 23 ]{t=0, 406 512 605 } supp:4

which is rewritten as:

512 605

406 512 605

Then, you would need to apply the Agrawal algorithm for generating association rules on it. This would give you the rules.

But, a better solution would be to modify SEQDIM to use FPGrowth directly instead of using PrefixSpan. It would be more efficient.

In any case, both of these solutions would require to do some programming. Also, you would need to think a little bit about it to make sure that there is no logical problems (i just thought about it for a few minutes)

Philippe

Options: Reply•Quote

Re: Data set

Posted by: Vathsala

Date: June 24, 2013 08:17PM

Hi,

Thanks, I also was thinking on similar bases but had some doubts about it, I wanted to check if the idea would be correct... Now i will go ahead.

Thanks

Options: Reply•Quote

Re: Data set

Posted by: Vathsala

Date: July 08, 2013 10:25PM

Hi,

I was able to get association rules by implementing the idea, I want to compare it with a similar algorithm for multidimentional association rule mining, can you suggest any.
And I am doing this as part of research for my phd, Can it be considered potential work to be published.

Thanks

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: July 09, 2013 03:22AM

Hi,

I don't know any algorithm for that. You would need to search on Google Scholar to see if some exists.

I think that you can publish and article on that but maybe not in a very competitive conference or journal. For a smaller conference or journal, i think that it could be ok. I say that because reviewers may say that the idea may be a little bit straightforward unless you have also added some other new ideas.

Philippe

Options: Reply•Quote

Re: Data set

Posted by: Vathsala.H

Date: January 14, 2014 07:54AM

Hi,

I was thinking about comparing two versions of model data set.
using association rules, here my idea is to

first find association rules from v1 data and apply it on observed data set for prediction. find the % error.

next find association rules from the v2 data and apply it on the observed data set for prediction. find the % error and compare v1 and v2 based on %error.

Is it Ok, And what are your comments and any suggestions or any other way to perform this.

Thanks and Regards
Vathsala.H

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: January 14, 2014 08:38AM

Hello,

So if I understnd well, your goal is to compare two sets of association rules that come from two different data files, to compare their prediction accuracy on a third data file.

I do not see any problem with this idea.

How to perform this?

Well, you can find the source code for association rule mining in SPMF.

Then you would need to write the code to calculate the prediction accuracy on the test data.

For the results of your project, I think that you may find some interesting results maybe but it will depend on your data. For example, if you had data from a store, you could compare the data from Summer and Winter. There would certainly be some different rules for transactions during the summer and winter. This is just some idea. I don't know what kind of data you have.

Best,

Philippe

Options: Reply•Quote

Re: Data set

Posted by: Vathsala

Date: February 11, 2014 03:00AM

Hi,
due to memory issues... I am first deriving frequent item sets using FPgrowth algorithm... and for association rule extraction i am writing a code that will open the file containing frequent item set and the corresponding support. Extract association rule by calculating the confidence from the support provided in the file(ie. prequent item set file). is this correct.

I just wanted to know
if there is difference only in the way we Mine the frequent item sets in different algorithms and association rule mining from these frequent item sets is the same in all algorithms.

Regards
Vathsala

Options: Reply•Quote

Re: Data set

Posted by: webmasterphilfv

Date: February 11, 2014 04:28AM

Hello,

Yes, in general, there are two steps to mine association rules:
1) discover frequent itemsets
2) generate association rules by using frequent itemsets found in Step 1.

For Step 1, there are several algorithms that can be used such as FPGrowth, Apriori, Eclat, etc.

For Step 2, it is always the same algorithm (the algorithm by Agrawal et al. 1993).

For Step 1, the input is a transaction database and the minsup threshold. The output is frequent itemsets and their support.

For Step 2, the input is frequent itemsets with their support and the minconf threshold. Note that we don't need the minsup threshold for Step2, since we know that all itemsets found in Step 1 respect the minsup threshold.

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: Vathsala

Date: February 17, 2014 08:17PM

Hi,

My data has 19 records with 38 attributes and need to run it with low like 0.08 minimum support, I have increased the max memory of VM to 1500 beyond this the system is not accepting. but seems like this memory is not enough... I used charm algorithm - some of the required rules are not getting generated from closed item set. I don have a bigger server for running it... can you suggest any other way. And is there any implementation here where we can specify the required item which needs to be in the frequent itemset... leading to minimum memory consumption.

Regards
Vathsala

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: webmasterphilfv

Date: February 18, 2014 06:56AM

Hi,

If you are running out of memory, then there are a few solutions:
- increase the memory (you have done it already)
- increase the minsup threshold... actually, you are using a very low threshold. I don't know if you know that but 0.08 for 19 records means that the minimum support is 1.52 transactions. Therefore everything that appear in more than 1.52 transactions is frequent. It seems like a very low minimum support. You have to understand that in this case, there may be too many itemsets. For example, if you have 19 transactions that have 38 items and are identical, you would get 2^38 frequent itemsets. What I want to highlight is that the lower minsup is, the more you will get a lot of itemsets. Usually it is recommended to start with a high minsup value and then lower it down.
- add some constraints. As you said, a good idea would be to just discover itemsets containing some given items. This capability is not offered in SPMF but you could add it very easily. You could either modify your input file to remove items that you don't need. Or you could modify the method for reading the file to ignore items that you don't want to consider for frequent itemsets.
- You could try the DCI_Closed algorithm with produce the same output as Charm and may use less memory.

Besides, charm does not generate rules. It generates closed itemsets.

Best,

Philippe

Edited 1 time(s). Last edit at 02/18/2014 06:58AM by webmasterphilfv.

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: vathsala

Date: February 18, 2014 08:05AM

Hi,

Actually my data needs a low minimum support because it is rainfall data, if we analise the data we would see that droughts and floods are very very rare if we are classifying rainfall into normal, excess, deficit, flood and drought. Hence I need a solution. I have a question do classifiers and association rule miners train themselves similarly or is one better than the other for prediction.

Thanks and Regards
Vathsala

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: webmasterphilfv

Date: February 18, 2014 08:43AM

The goal of a classifier is to classify new instances based on a set of training instances.

The goal of association rule mining is to discover association rules.

But yes, there are some classifiers that use association rules to perform classification. For example, CBA, CPAR, CMAR are classifiers that use association rules to perform classification. They work like the other classifiers by training the classifier and then performing classification. If you look at these papers, it was shown that these classifiers can provide a good accuracy. Whether they have better accuracy than other classifiers or not depend on the data. In SPMF, there is not code for performing classification. But you could implement some if you need.

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: vathsala

Date: February 18, 2014 08:14AM

Hi,

I am not able to see the ID3 corrected code here, Kindly send it to my email ID.

Thanks
vathsala

Options: Reply•Quote

Re: Data set for TRuleGrowth algorithm (sequential rule mining)

Posted by: webmasterphilfv

Date: February 18, 2014 08:19AM

I have moved the discussion about ID3 in a separate thread:

http://forum.ai-directory.com/read.php?5,1444

because ID3 is a different topic. It is better to create a new thread for each topic that we discuss in the forum when it is a different topic.

Options: Reply•Quote