Re: Non transactional database
Date: August 18, 2020 09:20AM
Good evening,
Welcome to the forum. Happy that the software is useful, and thanks for using it!
I will try to give some comments/suggestions below.
> I am aware the association rules are applied
> mainly to Transactional Databases, however I was
> wondering if I could use it to find association or
> patterns among features of a database.
Yes. Indeed. A transaction database can be viewed as a table with attributes. For example, for categorical attributelike Gender = {male,female}, you can create two items: one is male, and another is female. If an attribute is numeric like age, then it could be discretized, for example as : 1 to 18 years old, 25 to 35 years old, etc. That is one way to prepare your data, and you can indeed then apply itemset mining or association rule mining on it.
> I have got two different datasets, in the first
> one I have a dataset with many features describing
> companies, I would encode these features as items
> and try to find association among features.
Sounds like a good idea! You could find association between some features. You could try itemset mining or association rule mining.
If you want to find something specific to some item, you could try the TopKClassRule algorithm. It will find a rule that looks like this : X --> c where "c" is an item that you can choose using the "Fixed consequent items" optional parameter. Thus, for example, you can find rules that will involve somespecific attribute "c". Maybe it would be useful for you.
>
> The second one is actually a transactional
> database. However, it only has one item every
> purchase (it is an expensive item). I was thinking
> about using the items features (color, size, etc)
> as well as the buyer features (age, sex) to
> characterize patterns in the data. I've come up
> with the idea that these both set of features,
> could be treated as a set of items every
> transaction, and it could help me explain customer
> features and item features highly associated.
Yes, this is a good idea. You can indeed encode features as items in transactions. Then you will be able to find associations or itemsets containing items with buyer features and item features. Maybe you could get some interesting results.
However, a drawback is that you could find some patterns that only contain buyer features without any product or item features. Would that still be interesting for you? If you want to avoid this problem and make sure that an itemset, some algorithm will let you specify that you search for some specific item (like TopKClassRules that I mentionned before). Or you can just search for everything and look at what you find.
>
> I also would like to check for sequential patterns
> (now, considering the item itself as a product),
> is this item sold more in the summer/ winter. But
> again it is one item per transaction, is it
> possible to do sequential pattern mining in this
> case?
If you have a single sequence of events, you can do episode mining.
If you have many sequences and you want to find some subsequences that appear in many sequences, then it is sequential pattenr mining or sequential rule mining.
So I think that whether you can apply those depends on if you can model your data as one sequence or as many sequences.
If you consider that each customer is a sequence, and a customer may make more than 1 transaction, although each transaction is one item, then you would have a sequence for each customer, and you could find sequential patterns representing what several customers have bought over time. Like if you buy item A then later you buy item B.
If you consider all customers as a long sequence of transactions then you could try to find the episodes in this. An episode is some subsequences that appear frequently in a long sequence. There is no concept of summer or winter, but you could perhaps cheat by encoding "summer" as an item ? However, that would not guarantee that the episode that you will find will contain "summer".
Another idea would be to split the database into several databases for each seasons, and then to discover patterns in each database like summer, winter, and then to compare the patterns that you find...
There are several ways. The result may be good or bad depending on what you apply or how the data is prepared.
>
> Would you have any suggestions about relevant
> metrics in these both cases?
Also something interesting that I have not implemented in SPMF is something called "contrast patterns" or "emerging patterns". Those are used to find patterns that are different between two databases.
Other things that you could try on your data since you have a temporal dimensions is periodic patterns (something that is periodically repeating in a sequence). Not sure if it would be meaningful. There also some other things that are implemented in SPMF like "peak" itemsets such as some product has a peak of sale during the christmas season. Maybe it could be something interesting. Just some ideas.. maybe other things could also be tried.
>
> Thank you very much!
You are welcome!
Best regards