frequent pattern and association rule

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

frequent pattern and association rule

Posted by: khairy

Date: September 04, 2013 02:38AM

am confused about the two terms frequent pattern and association rules
suppose i want to discover frequent pattern and association rule using the following transaction i need a help to get the out of the two terms separately

TID Items

100 A C D
200 B C E
300 A B C E
400 B E

secondly did apriori and fp-growth all of them giving frequent items set and association rules together

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: September 04, 2013 04:59AM

Hello,

I think you are referring to "frequent itemset" and "association rule".

I will explain. Association rules are usually found in two steps:

Step 1) Discover the frequent itemsets.

Input: a transaction database and a minsup threshold set by the user
Output: all set of items that appear in more than minsup transactions from the database.

Step 2) Generate association rules by using frequent itemsets

Input : the frequent itemsets found in Step1 + the minconf threshold set by the user
Output: all the association rules respecting the minsup and minconf threshold.

So basically, you can see discovering frequent itemsets as an intermediary step to generate association rules.

Now, about the algorithms names.

FPGrowth is an algorithm to discover frequent itemsets (Step1)

Apriori is an algorithm to discover frequent itemsets (Step1) and it also include another algorithm to generate association rules (Step 2).

The algorithm to generate association rules from Apriori can also be applied with FPGrowth. Therefore, it is possible to also generate association rules with FPGrowth.

Actually, FPGrowth and Apriori (Step1) generate the same result if you give them the same output. The only difference is HOW they generate the itemsets (which strategy they use and which datastructure internally). To generate the rules, they would use the same algorithm in Step2.

If you want a good introduction to itemset and association rule mining, I suggest to read this great chapter from the book "introduction to data mining":

http://www-users.cs.umn.edu/~kumar/dmbook/ch6.pdf

It gives a lot more information such. For example, it explains why it is useful to find itemset to generate the rules instead of trying to generate the rules directly, etc.

Hope this helps,

PHilipp

Edited 2 time(s). Last edit at 09/04/2013 05:01AM by webmasterphilfv.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: September 05, 2013 09:54AM

Thank you very much for your helpful explain, also the chapter is useful and more clear.

regards,

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: October 06, 2013 08:16AM

Dears,

When we design a new algorithm and measure the execution time and memory usage, is it enough to justify that our algorithm is better or there is another criteria that can be used for evaluating the algorithm among the well known algos.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: October 06, 2013 08:32AM

For pattern mining, the most important is execution time & memory.

But when you evaluate your new algorithm, you generally also want to test scalability (how your algorithm perform if the size of the data increase). For this, you may want to use synthetic datasets.

Also, it is possible that some algorithms perform better on some type of datasets and worse on some other types. For example, generally, you could want to test your algorithm on some sparse datasets and also some dense datasets to see what is the behavior. It is important to use a few datasets in my opinion (not just one).

Options: Reply•Quote

Need Supermarket database

Posted by: saravana sai

Date: October 07, 2013 08:47PM

Helo sir,
I am Student doing project on data mining.In my project i am analyzing the buying pattern of customer,for that i need a real time supermarket database.Can i find that stored database from any sites and download??please help me to find..

Options: Reply•Quote

Re: Need Supermarket database

Posted by: webmasterphilfv

Date: October 08, 2013 04:42PM

I know two datasets.

One is named "retail". It contains anonymized transactions from customers. It can be downloaded here: http://fimi.ua.ac.be/data/

Another one is named "foodmart" and is provided by Microsoft and it is distributed with SQL server if i remember well.

Maybe there are some others available on the web.

Best,

Philippe

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: October 10, 2013 11:34AM

Dear Sir,

now am doing experiments in different datasets, now am seeking for redundant dataset, i mean dataset that contain a full transaction occur several times

for example

T1 1 3 5 6
T2 2 5 5
T3 1 3 5 6
T4 2 3 5 6
T5 1 3 5 6

T1 T3 and T5 having the same transaction items

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: philippe

Date: October 10, 2013 02:33PM

Hi,

For a transaction database containing redundant transactions in the dataset section of the SPMF website ( http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php ) , you can download Chess for example and pUMSB.

Mushrooms probably also have redundant transactions. you can check it.

Best,

Philippe

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: November 01, 2013 03:49PM

The Dataset Mushrooms not have a redundant transaction, hope we find more datasets having this kind of characteristics.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: November 01, 2013 07:22PM

Ok. Thanks to let us know that. It may be useful to know that for some other researchers.

Best,

Philippe

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: Rhana

Date: November 10, 2013 11:15AM

Where can I download the mushrooms and chess datasets?

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: November 10, 2013 01:24PM

Here : http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

Philippe

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: Arpit Varma

Date: November 25, 2013 07:08AM

I was performing comparative study between Apriori and Fp Growth algorithms implemented in Spmf for my college project. I took data set of 10,000 transactions with each transaction having up to 100 elements. It turns out Apriori is running more efficiently than Fp Growth when you look at the running time and memory usage.
Can someone please briefly explain why isn't FP growth Not running faster than Apriori? Should I change the input sets for better results?

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: sedighe

Date: November 30, 2013 05:29AM

hello . i have a question! i down load and run spmf.jar . because of this application have not some algoritm in defult mode! i want to add this algoritms as SPADE !how i can add algoritms to list of algoritms of spmf?

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: Philippe

Date: November 30, 2013 04:13PM

I answered you in the other thread. Please don't duplicate your messages.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: December 13, 2013 08:43PM

am asking about the time and space complexity for the following algorithms

Apriori
P-Grwoth is it O(n²)
Eclat
H-mine

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: Philippe

Date: December 14, 2013 03:29AM

What is n ?

You could check the respective papers about these algorithms for information about the complexity.

The execution time depends on many factors:
- the number of transactions
- the length of transactions
- whether the data is dense or sparse
- the number of different items
- the minsup threshold (and the number of patterns in the data)
- ...

So depending on your data, it is possible that sometimes an algorithm is faster but that on some other data another algorithm is faster because of the dataset characteristics.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: khairy

Date: March 29, 2014 05:26AM

Dear all

am searching for retail dataset or supermarket dataset, as it shown in the dataset content available in FIMI or other dataset repositories , they are written in a number format as example

38 39 47 48
38 39 48 49
23 56 27
56 57 58 45
32 41 59 60

which means blind data, in case if i want to find the association its better to say if customer by sugar and tea he also buy milk, instead of say if customer by 23 and 56 he also buy 27

please i need any retail , supermarket , food dataset that describe the item name in a external file as example 23 sugar, 56 tea, 27 milk.

as example the dataset used in this paper

Association rule mining using binary particle swarm optimization

thanks in advance

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: March 29, 2014 06:39AM

I think you can use the FoodMart 2000 dataset. It should have all that information. By searching on Google you can find the SQL database for Foodmart and convert it to text or whatever format you need.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: Tanri

Date: April 03, 2014 05:57AM

Dear Sir,

i have a question, now i am working on web usage mining with session data
for example

url1 url2 url3 url4 url5
sess1 1 0 1 0 1
sess2 0 1 1 0 1
sess3 0 1 0 0 1
sess4 1 1 1 0 0
sess5 1 0 1 0 0

i want to cluster sessions into meaningful group of cluster. the question is how the k means's initialization process for binary data ? how i choose random initial centroid or cluster centre ?

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: April 03, 2014 05:22PM

The original K-Means randomly choose the initial centroids so you don't need to choose them by yourself as a user.

So if a vector has n elements, (e.g. v = [v1, v2, ... vn]), K-Means will use a random number generator to determine the value of each element of the vector.

By the way, whether the data is binary or not, does not matter. If you want to generate a random number in the [0,1] interval, you just need to generate a random number and then use the modulo operator to convert the random number to a number in the [0,1] interval.

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: sahil

Date: March 30, 2014 10:35PM

I have created new java project for FP Growth and imported all the files successfully but getting this issue during compilation. please solve it

Exception in thread "main" java.lang.NullPointerException
at fp.MainTestAllAssociationRules_FPGrowth_saveToFile.fileToPath(MainTestAllAssociationRules_FPGrowth_saveToFile.java:42)
at fp.MainTestAllAssociationRules_FPGrowth_saveToFile.main(MainTestAllAssociationRules_FPGrowth_saveToFile.java:19)

Options: Reply•Quote

Re: frequent pattern and association rule

Posted by: webmasterphilfv

Date: March 31, 2014 04:53AM

I have answered you in the other thread. Please don't double post your message.

Options: Reply•Quote