The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum:
Mining patterns of web usage
Posted by: Keith
Date: May 19, 2014 11:19AM

I have a database table that captures web resource accesses (e.g., web pages) from users' sessions. It is fairly easy to construct sequences of accesses from this table.

I am interested in mining common patterns of web usage from these sequences. Basically, if one represents each web resource as a character, the problem is that of finding commonly occurring substrings.

SPMF offers many algorithms that can be used to identify common sequences based on input sequences of item sets. However, most of these seem to have a couple of drawbacks for the particular problem I'm trying to solve:
  1. The discovered common sequences ignore mismatches in the original sequences. For example, given two sequences "ABXC" and "AXBC", the algorithms I'm aware of will identify "ABC" as a common sequence. For my work, I want the identified elements of the common sequences to directly follow the one preceding in the input sequences.
  2. They apply towards item sets of any size. In my case, each item set always consists of a single element. I'm guessing most of these algorithms will suffer a performance penalty by handling the more general case.
Do any of the SPMF algorithms avoid these problems? In particular, I'd like to know of an algorithm that doesn't ignore mismatches in the input sequences.

Options: ReplyQuote
Re: Mining patterns of web usage
Date: May 19, 2014 03:13PM


You may have a look at the Hirate & Yamana algorithm. It may not do exactly what you want but it allows to specify gap constraints. You can specify a maxgap of 0 to only find consecutive items.

There are certainly a cost to handle the general case. But I think that this cost may not be that big.

Options: ReplyQuote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.