The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
high dimensional data with frequent itemsets
Posted by: david@gmail.com
Date: November 07, 2018 09:19AM

Hi,

Is itemset mining in high dimensional data ( like gene datasets) beneficial?
for example is bioinformatics datasets can be used in data mining ( association rule mining)?




kind regards
David

Options: ReplyQuote
Re: high dimensional data with frequent itemsets
Date: November 08, 2018 05:22AM

Hi David,

If you have high dimensional data (many attributes), then itemset mining can still be applied because each itemset will generally only involve a few attributes. Thus, even if you have many attributes, by applying frequent itemset mining, you will only find the small sets of values that appear often together using a subset of the attributes.

Now, if you have high dimensional data, a problem could be that the database will be very dense or in other words that transactions will be very similar with each other. If that is the case, the search space can become very large and algorithms may take a lot of time to run to check all the possibilities. For example, if there is a frequent itemset with 10 items, then all the 2^10-1 subsets will also be frequent. Thus, the more you have attributes, the more transactions are likely to be similar, and the more itemsets there could be... and thus the search space can be very large.

There are however some solutions to deal with a large search space. It is to set the parameter(s) to greater values, and to use some constraints such as not finding itemsets having more than 4 items. This will make an algorithm much more efficient and it will be able to run even in high dimensional data.

After that, it also depends on what you are doing. There are different algorithms to find patterns. Depending on your application, a type of patterns may be more useful than others and allow you to discover more interesting knowledge from your data.

Hope this helps.

Regards,
Philippe

Options: ReplyQuote
Re: high dimensional data with frequent itemsets
Posted by: david
Date: November 12, 2018 07:09AM

Thanks so much,


It is really helpful.
I just wonder is frequent itemsets generated from bioinformatics datasets beneficial.

for example, in gene datasets, what the knowledge we can get from the resulted frequent itemsets.

Kind regards
David

Options: ReplyQuote
Re: high dimensional data with frequent itemsets
Date: November 12, 2018 11:27PM

Hi David,

Glad it is helpful. That is a good question. I don't know much about bioinformatics so I cannot really say something about that. But maybe some people have used frequent itemsets in bioinformatics before... You could have a look at that perhaps. In my opinion, it could perhaps be used to find some correlations in the data that could provide insights for example about some related genes. Then, perhaps that this could help some people to investigate these genes for further research. But as I said, I am no expert in that field ;-) Just try to find some idea ;-)

Best regards,

Philippe

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.