The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Can SPMF find patterns of web usage leading to a particular outcome?
Posted by: Trevor
Date: August 14, 2013 09:23AM

I'm testing an educational website and I'm trying to see if it helps students. The site is for a plant identification class and it shows pictures of plants the students should know. Students can practice by typing in the names online before they go into class and take a real quiz.

The problem is that overall usage alone doesn't seem to predict success. Sometimes students use the site and then get the plant right on the quizzes. Other times they use the site and then still get it wrong. So I'd like to compare the specific web usage that leads to those two outcomes and see if there's a difference in patterns.

From the SQL server I can get a timestamped database showing what each student did online. My database looks like this:

time  student  species  photo  action
1:15  2        A        5      right answer
2:46  2        A        3      wrong answer
5:05  2        A        2      wrong answer
6:19  2        A        5      right answer
5:47  7        A        1      right answer
5:58  7        A        4      right answer
6:10  7        A        2      wrong answer
6:18  7        A        3      wrong answer


If Student 2 and Student 7 both failed to identify Species A on the test, I want the computer to tell me what they both have in common. In this case, time of studying doesn't seem to make a difference but they both looked at Species A 4 times, they both gave two right and two wrong answers, and they both gave wrong answers to photos 2 & 3.

Can SPMF look at those data and detect the patterns? How would I format the data and what algorithm would I use? My first instinct is to combine all the lines for Student 2 into a long string and combine all the lines for Student 7 into one long string. Am I on the right track?

Options: ReplyQuote
Re: Can SPMF find patterns of web usage leading to a particular outcome?
Date: August 15, 2013 06:47AM

Hi,

I think that you could view that as a problem of sequential pattern mining.

Yes, you could make one long sequences for each student.

Then the goal of sequential pattern mining is to find subsequences that are common to several students. So I think that it would fit your need.

But how to represent your data as a sequence of items? Algorithms like PrefixSpan take as input sequences, where sequences are list of transactions. A transaction is a group of items.

For your data, I would try to represent it as follows (this is not the SPMF format but it is for illustration):

sequence 2: A5right, A3wrong, a2wrong, a5right
sequence 7: A1right, A3right, A2wrong, A3wrong

So here "A5right", for example, would be an item.

In this example, the only sequential patterns found would be:

pattern 1: A3wrong
pattern 2: A5right
pattern 3: ...
....

There would not be any patterns containing more than one item because there is no sequence of two items appearing in the same order.


If you don't care about the sequential ordering, then you could have a look at association rules. Association rules will find associations between items that are common to several students. For example, for the previous sequences, you would find an association rule:

a2wrong --> a3wrong confidence : 100 % support (frequency) : 100%
...


Note that in SPMF, items are represented by integers for most algorithms (you could see the examples for details on the format).

Best,

Philippe



Edited 3 time(s). Last edit at 08/15/2013 06:52AM by webmasterphilfv.

Options: ReplyQuote
Re: Can SPMF find patterns of web usage leading to a particular outcome?
Posted by: Trevor
Date: August 15, 2013 07:34AM

Interesting!

So it sounds like there isn't one catch-all algorithm that will find all the patterns. I'll need to first make a list of the patterns that might be interesting and then choose the appropriate algorithm.

For instance, what you're describing would catch the pattern of giving wrong answers to photos 2 & 3 (which is very useful) but it wouldn't catch the more general pattern of giving two wrong answers and two right answers. If student 7's two wrong answers had come in connection with photo 5, it would go right past the filter.

Any thoughts on how to detect the time patterns? Some students spread their studying out over several days, while others cram right before the test. So it might be the same number of photos, but very different spacing. I'm not sure how to identify or classify those patterns.

Options: ReplyQuote
Re: Can SPMF find patterns of web usage leading to a particular outcome?
Date: August 15, 2013 08:54AM

Hi again,

Yes, I agree. There are different kinds of patterns and there is always the possibility of missing some information. Actually, we can see it as a two problems. The first one is how to encode the data. In artificial intelligence, it is well known that depending on how you encode your data, you face various limitations or have some benefits as a consequence of how your data is encoded. Second, there is the pattern mining problem, where depending on how you encode your data, you can apply various algorithms and the pattern that the algorithm that you choose will find can be more or less useful.

To consider time, there is some algorithms that do that. There are a some decisions to make or try. First, do you want to see time as the number of minutes or second or just the relative ordering of events? For a sequential pattern mining algorithm like PrefixSpan, it just consider the ordering of events but not the exact time. There exist some extension like the Hirate Yamana algorithm offered in SPMF, that handles items having a timestamp and you can apply time constraints to find the patterns. However, how the time is handled is rather rigid. For example, the pattern:

(0, A3wrong) (30 minutes later, A5wrong) would be viewed as different from (0, A3wrong) (31 minutes later, A5wrong) , which is probably not what you want.

So for handling time, an issue is do you want to split time into intervals or consider the exact time? It may be necessary to look for some algorithms that handle times ( maybe not just in SPMF - I just mentioned some algorithms as example, I don't mean that you should use these).

So basically, there are some decisions to take:
- do you want to consider time? or do you want to only consider the relative ordering? do you want to use time constraints (e.g. find patterns occurring within a maximum amount of time or where successive events are not separated by more than a given amount of time)? etc.


Philippe



Edited 1 time(s). Last edit at 08/15/2013 08:58AM by webmasterphilfv.

Options: ReplyQuote
Re: Can SPMF find patterns of web usage leading to a particular outcome?
Posted by: Trevor
Date: August 26, 2013 04:26AM

Quote
Philippe
For example, the pattern: (0, A3wrong) (30 minutes later, A5wrong) would be viewed as different from (0, A3wrong) (31 minutes later, A5wrong) , which is probably not what you want.

Thanks for the feedback. You've caught the problem exactly. I want to catch general patterns of usage, but I don't want the algorithm to be thrown off by minor differences in minutes or seconds.

It sounds like I should make some hypotheses about the amount of spacing that matters and then round all the time intervals to consistent values. So all gaps of under half an hour will round to 15 minutes, all gaps of half an hour to an hour will round to 45 minutes, and so forth.

I'll play around with that and see where it gets me.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.