Non transactional database

Posted by: Simone Zambonim

Date: August 18, 2020 03:37AM

Hello Professor Phil, thank you for providing the communitiy with this great tool.

I am aware the association rules are applied mainly to Transactional Databases, however I was wondering if I could use it to find association or patterns among features of a database.

I have got two different datasets, in the first one I have a dataset with many features describing companies, I would encode these features as items and try to find association among features.

The second one is actually a transactional database. However, it only has one item every purchase (it is an expensive item). I was thinking about using the items features (color, size, etc) as well as the buyer features (age, sex) to characterize patterns in the data. I've come up with the idea that these both set of features, could be treated as a set of items every transaction, and it could help me explain customer features and item features highly associated.

I also would like to check for sequential patterns (now, considering the item itself as a product), is this item sold more in the summer/ winter. But again it is one item per transaction, is it possible to do sequential pattern mining in this case?

Would you have any suggestions about relevant metrics in these both cases?

Thank you very much!

Re: Non transactional database

Posted by: webmasterphilfv

Date: August 18, 2020 09:20AM

Good evening,

Welcome to the forum. Happy that the software is useful, and thanks for using it!
I will try to give some comments/suggestions below.

> I am aware the association rules are applied
> mainly to Transactional Databases, however I was
> wondering if I could use it to find association or
> patterns among features of a database.

Yes. Indeed. A transaction database can be viewed as a table with attributes. For example, for categorical attributelike Gender = {male,female}, you can create two items: one is male, and another is female. If an attribute is numeric like age, then it could be discretized, for example as : 1 to 18 years old, 25 to 35 years old, etc. That is one way to prepare your data, and you can indeed then apply itemset mining or association rule mining on it.

> I have got two different datasets, in the first
> one I have a dataset with many features describing
> companies, I would encode these features as items
> and try to find association among features.

Sounds like a good idea! You could find association between some features. You could try itemset mining or association rule mining.

If you want to find something specific to some item, you could try the TopKClassRule algorithm. It will find a rule that looks like this : X --> c where "c" is an item that you can choose using the "Fixed consequent items" optional parameter. Thus, for example, you can find rules that will involve somespecific attribute "c". Maybe it would be useful for you.

>
> The second one is actually a transactional
> database. However, it only has one item every
> purchase (it is an expensive item). I was thinking
> about using the items features (color, size, etc)
> as well as the buyer features (age, sex) to
> characterize patterns in the data. I've come up
> with the idea that these both set of features,
> could be treated as a set of items every
> transaction, and it could help me explain customer
> features and item features highly associated.

Yes, this is a good idea. You can indeed encode features as items in transactions. Then you will be able to find associations or itemsets containing items with buyer features and item features. Maybe you could get some interesting results.

However, a drawback is that you could find some patterns that only contain buyer features without any product or item features. Would that still be interesting for you? If you want to avoid this problem and make sure that an itemset, some algorithm will let you specify that you search for some specific item (like TopKClassRules that I mentionned before). Or you can just search for everything and look at what you find.

>
> I also would like to check for sequential patterns
> (now, considering the item itself as a product),
> is this item sold more in the summer/ winter. But
> again it is one item per transaction, is it
> possible to do sequential pattern mining in this
> case?

If you have a single sequence of events, you can do episode mining.

If you have many sequences and you want to find some subsequences that appear in many sequences, then it is sequential pattenr mining or sequential rule mining.

So I think that whether you can apply those depends on if you can model your data as one sequence or as many sequences.

If you consider that each customer is a sequence, and a customer may make more than 1 transaction, although each transaction is one item, then you would have a sequence for each customer, and you could find sequential patterns representing what several customers have bought over time. Like if you buy item A then later you buy item B.

If you consider all customers as a long sequence of transactions then you could try to find the episodes in this. An episode is some subsequences that appear frequently in a long sequence. There is no concept of summer or winter, but you could perhaps cheat by encoding "summer" as an item ? However, that would not guarantee that the episode that you will find will contain "summer".

Another idea would be to split the database into several databases for each seasons, and then to discover patterns in each database like summer, winter, and then to compare the patterns that you find...

There are several ways. The result may be good or bad depending on what you apply or how the data is prepared.

>
> Would you have any suggestions about relevant
> metrics in these both cases?

Also something interesting that I have not implemented in SPMF is something called "contrast patterns" or "emerging patterns". Those are used to find patterns that are different between two databases.

Other things that you could try on your data since you have a temporal dimensions is periodic patterns (something that is periodically repeating in a sequence). Not sure if it would be meaningful. There also some other things that are implemented in SPMF like "peak" itemsets such as some product has a peak of sale during the christmas season. Maybe it could be something interesting. Just some ideas.. maybe other things could also be tried.

>
> Thank you very much!

You are welcome!

Best regards

Re: Non transactional database

Posted by: Simone Rosana Zambonim

Date: August 20, 2020 06:45AM

Thank you for kindly replying, Professor.
I was thrilled to read your suggestions.

I was wrapping my mind around all your suggestions and certainly TopKClassRule algorithm will be very usefull to extract the rules that contain the bussines interest from these companies attributes.

Now, regarding the sequential problem I am still a bit confused. I think it is better if I explain exactly what I'm dealing with.
The transactional database is a real state database and it has the records of the rent of apartments/houses, that's what I meant by an expensive item and also why it is a single item every transaction (exclusive).

We have two goals when mining the sequential pattern.

1st - Which kind of realty is most requested in the summer/winter?
In this case I understood what you meant by adding the season as an attibute and use episode mining or spliting the database in two and comparing the patterns. In fact peak high utility itemset, would be interesting in this case too. The itemset composed of realty feautures and evaluate each month these items where rented most.

2nd - The second pattern we aim to find is of the kind.. when people move do they upgrade for a larger apartment, or find something cheaper? It obviously depends on the client, so I assume that having both users and products attibutes as items would be the case, and the sequence would be described by the sequences of movings this user has done. In this case, obvously many items would repeat through the sequence (like the user sex). Finally, I believe it would be the case of sequential pattern mining.

PS.: I searched for "contrast patterns" or "emerging patterns" and found this interesting paper that led me to these github page, https://github.com/SIMIDAT/epm-framework . The .jar file is not currently working and honestly I don't have the skills in java to go through the source code, but I thought it would be interesting to share with you.

Thank you, Professor.

Re: Non transactional database

Posted by: webmasterphilfv

Date: August 30, 2020 10:41AM

Good evening,

Sorry for the delay to answer. I have read the message a while ago, but I did not find much time recently to answer messages on the forum. As the semester starts and many projects happen at the same time, I had to wait a little to answer.

That is a very interesting topic. I understand now the reason for a single item per transaction.

Since there is a single item per transaction(renting an appartment), I guess that it is best to combine the data of all persons in a same database.

A possibility could be to see this as a very long sequence, as I discussed and apply episode mining. But as we discussed this does not consider the season. But season can be encoded as an item as some kind of solution that is maybe acceptable..

Another possibility is to split that sequence in a sequence database to apply sequential pattern mining. For example, if you split the sequence into months, you would get 12 sequences, one for each month. But this would not be very helpful because then sequential pattern mining would find patterns common to several months and not necessarily by season. And it would not tell the differences by seasons but find what is common!

Another way as I said is to split the database by seasons into several databases and then mine the patterns in each database separately, or use contrast patterns. But I dont have algorithms for that in SPMF. And thanks for sharing the link for the code. Interesting.

Another way that I just tought now is to consider applying Multi-dimensional sequential pattern mining. For this you could check some algorithms in SPMF. Multi-dimensional sequential pattern mining allows to add some data to annotate sequences. For example,
you could have a sequence like that

Person 1: Female - PhD - USA : rentCheapAppt, rentExpensiveApp, ... etc.
Person 2: Male - HighSchool - Canada: rentExpensiveAppt, rentCheapAppt ... etc.

Then you could find patterns like Female in USA like to rent this type of appartment.

There are certainly other ways....

I think maybe there is not a perfect way to find patterns in your data that will answer all your questions about season, profile of users, etc. Maybe you need to use a few algorithms to answer different questions about your data!

In any case, interesting topic!

Best regards,

Edited 2 time(s). Last edit at 08/30/2020 10:43AM by webmasterphilfv.