Algorithms for clustering sequential data?

The Data Mining Forum

open-source data mining software

data mining conferences

Data Science for Social and Behavioral Analytics DSSBA 2022

data science journal

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Algorithms for clustering sequential data?

Posted by: walkman

Date: October 08, 2013 07:04AM

Hi! I'm looking for an approach to cluster sequential data. The problem I'm trying to solve is to figure out different types of consumers by analyzing their paths to purchase. Say for a specific consumer, he first gets an email coupon, than click on a paid search ad, then made a purchase. Another consumer may go through a different path.

Is there a way for me to cluster these paths to characterize different types of consumers? Finding patterns or association rules doesn't solve my problem directly. Any thoughts are appreciated!

Options: Reply•Quote

Re: Algorithms for clustering sequential data?

Posted by: webmasterphilfv

Date: October 08, 2013 09:30AM

Hi,

If you use pattern mining, a simple idea is to compare the sequential patterns shared by the sequences to determine if they belong to the same cluster. When a sequential pattern is found, it can be annotated with the set of sequences containing the pattern. Then this information could be used to cluster sequences by the number of patterns shared.

This is just a basic idea. I have not thought about it in details and maybe that there would be some problems.

Best,

Philippe

Options: Reply•Quote

Re: Algorithms for clustering sequential data?

Posted by: walkman

Date: October 08, 2013 11:41AM

Thank you! That's what I thought as well, but I just wonder whether there is some specific algorithm for sequential data clustering. BTW, can I have the SPMF toolkit output the sequences after a certain pattern? By default it only gives me the total number of sequences.

Options: Reply•Quote

Re: Algorithms for clustering sequential data?

Posted by: webmasterphilfv

Date: October 08, 2013 05:01PM

Yes, some algorithms of SPMF can output the sequences where a pattern appears. It would be simple to do but it would require a little bit programming

If you use the source code of SPMF. The version of PrefixSpan that save to file can be modified as follow to show the sequences where a pattern appears:

In the file AlgoPrefixSpan in the folder ca.pfv.spmf.algorithms.sequentialpatterns.BIDE_and_prefixspan,
change this:

	private void savePattern(SequentialPattern prefix) throws IOException {
		// increase the number of pattern found for statistics purposes
		patternCount++; 
	
		// if the result should be saved to a file
		if(writer != null){
			// create a stringbuffer
			StringBuffer r = new StringBuffer("";
			// for each itemset in this sequential pattern
			for(Itemset itemset : prefix.getItemsets()){
				// for each item
				for(Integer item : itemset.getItems()){
					r.append(item.toString()); // add the item
					r.append(' ');
				}
				r.append("-1 "; // add the itemset separator
			}		
			// add the support
			r.append("#SUP: ";
			r.append(prefix.getAbsoluteSupport());
			
			// write the string to the file
			writer.write(r.toString());
			// start a new line
			writer.newLine();
		}
		// otherwise the result is kept into memory
		else{
			patterns.addSequence(prefix, prefix.size());
		}

	}

by that:

	private void savePattern(SequentialPattern prefix) throws IOException {
		// increase the number of pattern found for statistics purposes
		patternCount++; 
	
		// if the result should be saved to a file
		if(writer != null){
			// create a stringbuffer
			StringBuffer r = new StringBuffer("";
			// for each itemset in this sequential pattern
			for(Itemset itemset : prefix.getItemsets()){
				// for each item
				for(Integer item : itemset.getItems()){
					r.append(item.toString()); // add the item
					r.append(' ');
				}
				r.append("-1 "; // add the itemset separator
			}		
			// add the support
			r.append("#SUP: ";
			r.append(prefix.getAbsoluteSupport());

			r.append(" SEQUENCE IDS : ";
			// for each itemset in this sequential pattern
			for(Integer sequenceID : prefix.getSequenceIDs()){
				// for each item
				r.append(sequenceID); // add the item
				r.append(' ');
			}	
			
			// write the string to the file
			writer.write(r.toString());
			// start a new line
			writer.newLine();
		}
		// otherwise the result is kept into memory
		else{
			patterns.addSequence(prefix, prefix.size());
		}

	}

After that if you run the algorithm by using the test file MainTestPrefixSpan_saveToFile, it will give you a result where each pattern will be followed by the ids of sequences containing the pattern:

1 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 3 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 3 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 3
1 -1 3 -1 2 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
1 -1 2 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
1 -1 2 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 3
1 -1 2 3 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 -1 2 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
1 2 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 2 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
1 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
2 -1 6 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 2 3
2 3 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
2 -1 4 -1 #SUP: 2 SEQUENCE IDS : 0 2
2 -1 4 -1 3 -1 #SUP: 2 SEQUENCE IDS : 0 2
3 -1 #SUP: 4 SEQUENCE IDS : 0 1 2 3
3 -1 1 -1 #SUP: 2 SEQUENCE IDS : 0 1
3 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 3
3 -1 2 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
4 -1 #SUP: 3 SEQUENCE IDS : 0 1 2
4 -1 3 -1 #SUP: 3 SEQUENCE IDS : 0 1 2
4 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 1 2
4 -1 2 -1 #SUP: 2 SEQUENCE IDS : 1 2
5 -1 #SUP: 3 SEQUENCE IDS : 1 2 3
5 -1 6 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 6 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 1 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
5 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 #SUP: 3 SEQUENCE IDS : 0 2 3
6 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 3 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 2 -1 #SUP: 2 SEQUENCE IDS : 2 3
6 -1 2 -1 3 -1 #SUP: 2 SEQUENCE IDS : 2 3

For the other question, I don't know much about sequence clustering. But I think that some work have perhaps been done related to this in bioinformatics.

Best,

Philippe

Edited 2 time(s). Last edit at 10/08/2013 05:06PM by webmasterphilfv.

Options: Reply•Quote

Re: Algorithms for clustering sequential data?

Posted by: walkman

Date: October 09, 2013 04:58AM

Thank you so much! I'll give it a shot smiling smiley

Options: Reply•Quote

This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.