Re: high dimensional data with frequent itemsets
Date: November 08, 2018 05:22AM
Hi David,
If you have high dimensional data (many attributes), then itemset mining can still be applied because each itemset will generally only involve a few attributes. Thus, even if you have many attributes, by applying frequent itemset mining, you will only find the small sets of values that appear often together using a subset of the attributes.
Now, if you have high dimensional data, a problem could be that the database will be very dense or in other words that transactions will be very similar with each other. If that is the case, the search space can become very large and algorithms may take a lot of time to run to check all the possibilities. For example, if there is a frequent itemset with 10 items, then all the 2^10-1 subsets will also be frequent. Thus, the more you have attributes, the more transactions are likely to be similar, and the more itemsets there could be... and thus the search space can be very large.
There are however some solutions to deal with a large search space. It is to set the parameter(s) to greater values, and to use some constraints such as not finding itemsets having more than 4 items. This will make an algorithm much more efficient and it will be able to run even in high dimensional data.
After that, it also depends on what you are doing. There are different algorithms to find patterns. Depending on your application, a type of patterns may be more useful than others and allow you to discover more interesting knowledge from your data.
Hope this helps.
Regards,
Philippe