Re: Mining patterns of web usage

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Mining patterns of web usage

Posted by: Keith

Date: May 19, 2014 11:19AM

I have a database table that captures web resource accesses (e.g., web pages) from users' sessions. It is fairly easy to construct sequences of accesses from this table.

I am interested in mining common patterns of web usage from these sequences. Basically, if one represents each web resource as a character, the problem is that of finding commonly occurring substrings.

SPMF offers many algorithms that can be used to identify common sequences based on input sequences of item sets. However, most of these seem to have a couple of drawbacks for the particular problem I'm trying to solve:

The discovered common sequences ignore mismatches in the original sequences. For example, given two sequences "ABXC" and "AXBC", the algorithms I'm aware of will identify "ABC" as a common sequence. For my work, I want the identified elements of the common sequences to directly follow the one preceding in the input sequences.
They apply towards item sets of any size. In my case, each item set always consists of a single element. I'm guessing most of these algorithms will suffer a performance penalty by handling the more general case.

Do any of the SPMF algorithms avoid these problems? In particular, I'd like to know of an algorithm that doesn't ignore mismatches in the input sequences.

Options: Reply•Quote

Re: Mining patterns of web usage

Posted by: webmasterphilfv

Date: May 19, 2014 03:13PM

hello

You may have a look at the Hirate & Yamana algorithm. It may not do exactly what you want but it allows to specify gap constraints. You can specify a maxgap of 0 to only find consecutive items.

There are certainly a cost to handle the general case. But I think that this cost may not be that big.

Options: Reply•Quote