Re: high dimensional data with frequent itemsets

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

high dimensional data with frequent itemsets

Posted by: david@gmail.com

Date: November 07, 2018 09:19AM

Hi,

Is itemset mining in high dimensional data ( like gene datasets) beneficial?
for example is bioinformatics datasets can be used in data mining ( association rule mining)?

kind regards
David

Options: Reply•Quote

Re: high dimensional data with frequent itemsets

Posted by: webmasterphilfv

Date: November 08, 2018 05:22AM

Hi David,

If you have high dimensional data (many attributes), then itemset mining can still be applied because each itemset will generally only involve a few attributes. Thus, even if you have many attributes, by applying frequent itemset mining, you will only find the small sets of values that appear often together using a subset of the attributes.

Now, if you have high dimensional data, a problem could be that the database will be very dense or in other words that transactions will be very similar with each other. If that is the case, the search space can become very large and algorithms may take a lot of time to run to check all the possibilities. For example, if there is a frequent itemset with 10 items, then all the 2^10-1 subsets will also be frequent. Thus, the more you have attributes, the more transactions are likely to be similar, and the more itemsets there could be... and thus the search space can be very large.

There are however some solutions to deal with a large search space. It is to set the parameter(s) to greater values, and to use some constraints such as not finding itemsets having more than 4 items. This will make an algorithm much more efficient and it will be able to run even in high dimensional data.

After that, it also depends on what you are doing. There are different algorithms to find patterns. Depending on your application, a type of patterns may be more useful than others and allow you to discover more interesting knowledge from your data.

Hope this helps.

Regards,
Philippe

Options: Reply•Quote

Re: high dimensional data with frequent itemsets

Posted by: david

Date: November 12, 2018 07:09AM

Thanks so much,

It is really helpful.
I just wonder is frequent itemsets generated from bioinformatics datasets beneficial.

for example, in gene datasets, what the knowledge we can get from the resulted frequent itemsets.

Kind regards
David

Options: Reply•Quote

Re: high dimensional data with frequent itemsets

Posted by: webmasterphilfv

Date: November 12, 2018 11:27PM

Hi David,

Glad it is helpful. That is a good question. I don't know much about bioinformatics so I cannot really say something about that. But maybe some people have used frequent itemsets in bioinformatics before... You could have a look at that perhaps. In my opinion, it could perhaps be used to find some correlations in the data that could provide insights for example about some related genes. Then, perhaps that this could help some people to investigate these genes for further research. But as I said, I am no expert in that field ;-) Just try to find some idea ;-)

Best regards,

Philippe

Options: Reply•Quote