Re: Sequential pattern mining datasets

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: April 28, 2010 05:48AM

Here is some link to some public datasets (sequence databases) for testing sequential pattern mining algorithms:

BMS-WebView-1 (the first of the three KDD CUP 2000 datasets which are sometimes called "Gazelle") : click-stream data from a webstore named Gazelle

Original format: BMS1.dat SPMF format: BMS1_spmf

The Kosarak dataset. a click-stream dataset from a hungarian news portal:

http://fimi.cs.helsinki.fi/data/

The American Sign Language Dataset:

http://cs-people.bu.edu/panagpap/Research/asl_mining.htm

Legume Sequence Datasets:

http://www.icrisat.org/what-we-do/biotechnology/LegumeSequenceDatasets.html

The IBM Quest Synthetic Data Generator to generate synthetic datasets:

To download the generator:
http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zip

To download some generated datasets in SPMF format:
- data.slen_10.tlen_1.seq.patlen_2.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_10.tlen_1.seq.patlen_3.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_10.tlen_1.seq.patlen_4.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_10.tlen_1.seq.patlen_5.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_10.tlen_1.seq.patlen_6.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_8.tlen_1.seq.patlen_2.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_8.tlen_1.seq.patlen_3.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_8.tlen_1.seq.patlen_4.lit.patlen_8.nitems_5000_spmf.txt
- data.slen_8.tlen_1.seq.patlen_5.lit.patlen_8.nitems_5000_spmf.txt

MSNBC dataset of click-stream data that you can download from:

http://archive.ics.uci.edu/ml/datasets/MSNBC.com+Anonymous+Web+Data

Additional datasets such as BIBLE, LEVIATHAN can be found here:

http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

Also, here is some Java code for reading the datasets. This is some code that I wrote to test some sequential pattern mining algorithms.

Java code for reading the Kosarak dataset:

public void loadFileKosarakFormat(String filepath, int nblinetoread)
	throws IOException {
String thisLine;
BufferedReader myInput = null;
try {
	FileInputStream fin = new FileInputStream(new File(filepath));
	myInput = new BufferedReader(new InputStreamReader(fin));
	int i = 0;
	while ((thisLine = myInput.readLine()) != null) {
		// ajoute une séquence
		String[] split = thisLine.split(" ";
		i++;
		if (nblinetoread == i) {
			break;
		}
		Sequence sequence = new Sequence();
		for (String value : split) {
			List<Integer> itemset = new ArrayList<Integer>();
			Integer item = Integer.parseInt(value);
			itemset.add(item);
			sequence.addItemset(itemset.toArray());
		}
		sequences.add(sequence);
		
	}
} catch (Exception e) {
	e.printStackTrace();
} finally {
	if (myInput != null) {
		myInput.close();
	}
}
}

Java code for reading the BMSWebView1 format:

public void loadFileWebViewFOrmat(String filepath, int nbLine) {
		String thisLine;
		BufferedReader myInput = null;
		try {
			FileInputStream fin = new FileInputStream(new File(filepath));
			myInput = new BufferedReader(new InputStreamReader(fin));
			int realID = 0;
			int lastId = 0;
			Sequence sequence = null;
			while ((thisLine = myInput.readLine()) != null) {
				// ajoute une séquence
				String[] split = thisLine.split(" ";
				int id = Integer.parseInt(split[0]);
				int val = Integer.parseInt(split[1]);
				
				if(lastId != id){
					if(lastId!=0 ){ //&& sequence.size() >=2
						sequences.add(sequence);
						realID++;
					}
					sequence = new Sequence();
					lastId = id;
				}
				List<Integer> itemset = new ArrayList<Integer>();
				itemset.add(val);

				sequence.addItemset(itemset.toArray());
			}
		} catch (Exception e) {
			e.printStackTrace();
		} 
	}

Java code for reading the Snake dataset (while eliminating some short sequences:

public void loadSnakeDataset(String filepath, int nbLine) {
		String thisLine;
		BufferedReader myInput = null;
		try {
			FileInputStream fin = new FileInputStream(new File(filepath));
			myInput = new BufferedReader(new InputStreamReader(fin));
			while ((thisLine = myInput.readLine()) != null) {
				if(thisLine.length() >= 50){
					Sequence sequence = new Sequence();
					for(int i=0; i< thisLine.length(); i++){
						List<Integer> itemset = new ArrayList<Integer>();
						int character = thisLine.toCharArray()[i ] - 65;
						System.out.println(thisLine.toCharArray()[i ] + " " + character);
						itemset.add(character);

						sequence.addItemset(itemset.toArray());
					}
					sequences.add(sequence);
				}
			}
		} catch (Exception e) {
			e.printStackTrace();
		} 
	}

Java source code for reading the Sign dataset:

public void loadFileSignLanguage(String fileToPath, int i) {
		String thisLine;
		BufferedReader myInput = null;
		try {
			FileInputStream fin = new FileInputStream(new File(fileToPath));
			myInput = new BufferedReader(new InputStreamReader(fin));
			String oldUtterance = "-1";
			Sequence sequence = null;
			while ((thisLine = myInput.readLine()) != null) {
				if(thisLine.length() >= 1 && thisLine.charAt(0) != '#'){
					String []tokens = thisLine.split(" ";
					String currentUtterance = tokens[0];
					if(!currentUtterance.equals(oldUtterance)){
						if(sequence != null){
							sequences.add(sequence);
						}
						sequence = new Sequence();
						oldUtterance = currentUtterance;
					}
					for(int j=1; j< tokens.length; j++){
						int character = Integer.parseInt(tokens[j]);
						if(character == -11 || character == -12){
							continue;
						}
						if(character >= maxItem){
							maxItem = character;
						}
						if(character < minItem){
							minItem = character;
						}
						sequence.addItemset(new Object[]{character});
					}
				}
			}
			sequences.add(sequence);
			System.out.println(sequence.toString());
		} catch (Exception e) {
			e.printStackTrace();
		} 
	}

Java code for reading the MSNBC dataset (I did not test this one since a long time so it would be better to verify that it works correctly):

public void loadFileMSNBCFormat(String filepath, int nblinetoread)
	throws IOException {
	String thisLine;
	BufferedReader myInput = null;
		try {
			FileInputStream fin = new FileInputStream(new File(filepath));
			myInput = new BufferedReader(new InputStreamReader(fin));
			int i = 0;
			while ((thisLine = myInput.readLine()) != null) {
				String[] split = thisLine.split(" ";
				i++;
				if (nblinetoread == i) {
					break;
				}
				Sequence sequence = new Sequence(sequences.size());
				for (String value : split) {
					Itemset itemset = new Itemset();
					itemset.addItem(Integer.parseInt(value));
					sequence.addItemset(itemset);
				}
					sequences.add(sequence);
				
			}
		} catch (Exception e) {
			e.printStackTrace();
		} finally {
				if (myInput != null) {
					myInput.close();
				}
			}
		}

Java code for reading binary sequence datasets generated by the IBM QUEST Data Geneator :

public void loadFileBinaryFormat(String path) throws IOException {
        String thisLine;
        // BufferedReader myInput = null;
        DataInputStream myInput = null;
        try {
            FileInputStream fin = new FileInputStream(new File(path));
            myInput = new DataInputStream(fin);
            Sequence sequence = new Sequence(sequences.size());
            Itemset itemset = new Itemset();
            while (myInput.available() != 0) {
                int value = INT_little_endian_TO_big_endian(myInput.readInt());
               
                if (value == -1) { // indicate the end of an itemset
                    sequence.addItemset(itemset);
                    itemset = new Itemset();
                } else if (value == -2) { // indicate the end of a sequence
                    sequences.add(sequence);
                    sequence = new Sequence(sequences.size());
                } else {
                    // extract the value for an item
                    Item item = new Item(value);
                    itemset.addItem(item);
                }
            }
        } catch (Exception e) {
            e.printStackTrace();
        }
    }

	// This function was written by Anghel Leonard:
	int INT_little_endian_TO_big_endian(int i) {
		return ((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8)
				+ ((i >> 24) & 0xff);
	}

Edited 23 time(s). Last edit at 02/08/2017 06:31PM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Leo

Date: April 27, 2011 12:02AM

Wow. This is a great list. Thanks. That is gonna be very useful for my project on sequential pattern mining. I was looking for sequential pattern mining datasets for a while and they are very hard to find.

I think that researchers should share their datasets more often. This is a problem in data mining in general. Also researchers should share the source code of their algorithms because everytime that I need an algorithm I have to implement it by myself again! What a waste of time! I have tried to contact some researchers by e-mail to ask their datasets and algorithms implementations but often they do not answer. Anyway, I just want to say thank you for this list ;-)

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Wang weina

Date: August 28, 2011 12:38AM

Hi! My name is Wang weina,I come from guangxi university ,China.I 'm interest in PrefixSpan algorithms . I can run into ideas in the process of studing and need Dataset for testing sequential pattern mining algorithms. so,I hope that you can help me,so I am writing to formally request to get me BMS-WebView-1 and BMS-WebView-2 (KDD CUP 2000 datasets) . Thank you for your attention to these requests. I look forward to hearing from you soon.

yours sincerely,
Wang weina

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: August 28, 2011 02:13AM

Hi Wang Weina,

You can download the BMS-WebView1 dataset here:

http://www.philippe-fournier-viger.com/spmf/datasets/BMS1.dat

Also, here is a link to download the IBM Quest Synthetic Data Generator that is frequently used in the data mining literature to generate synthetic datasets:

http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zip

Best,

Philippe

Edited 5 time(s). Last edit at 12/23/2011 04:14PM by webmasterphilfv.

Options: Reply•Quote

output of spmf code

Posted by: ismah

Date: June 27, 2013 12:42AM

hi,
i had installed netbeans and import spmf in it as far as i run it i can see the output it always give and exception.... i dont know where the problem is?
please help me how can i run it to see the output of the code

Options: Reply•Quote

Re: output of spmf code

Posted by: webmasterphilfv

Date: June 27, 2013 04:51AM

Hi,

What is the exception that you get? Please give me more details so that I can help you.

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ABI

Date: July 23, 2013 03:04AM

How to calculate frequent pattern items from the second column of BMS-WebView-1 dataset ( BMS1.dat )

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: tisonet

Date: December 23, 2011 09:05AM

Hi, in the paper of BIDE algorithm, authors test their algorithm on GAZELLE, PI and SNAKE datasets.

Can I find these datasets somewhere?

Thanks to reply.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Phil

Date: December 23, 2011 04:05PM

Hello Tisonet,

The Gazelle dataset is actually the BMS datasets used in KDD cup 2000. There are three datasets : BMSWebview1, BMSWebview2 and BMSPOS. They are not available anymore on the KDDCup website. But you can download BMSWebview1 from this page. I do not have BMSWebview2 and BMSPOS. But I think that the most interesting dataset is BMSWebView1.

For PI, I did not find a way to get it.

For Snake, I have the dataset. I did not put it on this webpage because it is not a public dataset. But you can send me an e-mail to philippe.fv _AT_ gmail.com .

Best,
Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Dvijesh88

Date: March 10, 2012 04:43AM

can any method to read the binary file which is generator by the IBM synthetic data?

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: March 10, 2012 05:39AM

Hello Dvijesh,

I think that this would work:


	public void loadFileBinaryFormat(String path) throws IOException {
		String thisLine;
		// BufferedReader myInput = null;
		DataInputStream myInput = null;
		try {
			FileInputStream fin = new FileInputStream(new File(path));
			myInput = new DataInputStream(fin);
			Sequence sequence = new Sequence(sequences.size());
			Itemset itemset = new Itemset();
			while (myInput.available() != 0) {
				int value = INT_little_endian_TO_big_endian(myInput.readInt());
				if (value == -1) { // indicate the end of an itemset
					sequence.addItemset(itemset);
					itemset = new Itemset();
				} else if (value == -2) { // indicate the end of a sequence
					// check if the last "-1" was not included
					if (itemset.size() > 0) {
						sequence.addItemset(itemset);
						itemset = new Itemset();
					}
					sequences.add(sequence);
					sequence = new Sequence(sequences.size());
				} else {
					// extract the value for an item
					Item item = new Item(value);
					itemset.addItem(item);
				}
			}
			sequences.add(sequence);
		} catch (Exception e) {
			e.printStackTrace();
		}
	}

	// 4-byte number this function was taken from the internet (by Anghel
	// Leonard)
	int INT_little_endian_TO_big_endian(int i) {
		return ((i & 0xff) << 24) + ((i & 0xff00) << 8) + ((i & 0xff0000) >> 8)
				+ ((i >> 24) & 0xff);
	}

Please let me know if it works.

Philippe

Edited 3 time(s). Last edit at 05/10/2012 08:45PM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Asim

Date: August 02, 2016 08:33AM

Sequence sequence = new Sequence(sequences.size());
when i run this code it give me error that Sequence class is undefined.Actually i m executing the above code for snake datastes,plz help me in loading and running code for snake data sets.thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Dvijesh88

Date: March 10, 2012 09:19AM

yes it work
but i get this error

why?

java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at readfile.SequenceDatabase.loadFileBinaryFormat(SequenceDatabase.java:37)
at readfile.NewClass.main(NewClass.java:19)

after this error, program show the patterns.

but it all are the repeat and will and not change the sequence id. i will snd you program.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: juned

Date: March 10, 2012 09:12PM

How to use this source code in netbeans IDE

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: juned

Date: March 11, 2012 08:35AM

How to read the dataset source code in netbeans IDE.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: March 11, 2012 07:36PM

juned Wrote:
-------------------------------------------------------
> How to read the dataset source code in netbeans
> IDE.

Hello Juned,

It depends which algorithm you want to use. These datasets are sequence databases. They are suitable for algorithm like PrefixSpan, BIDE, SPAM....

To use the Java code for reading these datasets, you need to do as follows.

For example, if you want to use the Snake dataset with PrefixSpan, then you should copy the method loadSnakeDataset in the class SequenceDatabase of the package ca.pfv.spmf.sequentialpaterns.prefixspan. After that, you would need to modify the test file MainTestPrefixSpan to use loadSnakeDataset() instead of loadFile for loading the file.

Actually, you just need to paste the methods in the class SequenceDatabase of the algorithm that you want to use.

Normally, it should work. I did not test will all the algorithms. You may need to change one or two variables names perhaps...

If you want some datasets for algorithm like Apriori, FPGrowth, then you would need transaction databases instead of sequence database. You can find some at fimi.ua.ac.be/data/ .

Philppe

Edited 1 time(s). Last edit at 03/11/2012 07:39PM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Derek

Date: April 18, 2012 02:33PM

Thanks a lot for this list! smiling smiley

I've been looking for datasets for a looooong time!

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: doris

Date: May 11, 2012 06:05AM

but The IBM Quest Synthetic Data Generator cannot work at all.neither at windows xp or windows7. is there anything wrong?

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Philippe

Date: May 11, 2012 06:21AM

The IBM generator should work. I have tried it yesterday and generated some datasets.

What kind of error did you get?

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: tisonet

Date: May 12, 2012 01:14AM

You have to pass in a correct parameters to generator.

Try run with these for help:

ibm_gen_file.exe seq -help

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: dilek

Date: May 22, 2012 12:57AM

Thank you very much for sharing the data sets.

I could not find out how to generate data sets in which each transaction only one single item.

Is it possible to generate single-item sequence data like web click data with this generator?

Dilek

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: May 22, 2012 07:33AM

Hello Dilek,

You are welcome! I did not find how to generate single item in each itemset with the IBM dataset generator. I think that maybe it is not possible with the IBM dataset generator.

A solution would be to modify the IBM generator.

Otherwise, an alternative is to use a simple sequence database generator in Java that I wrote. The website: http://www.philippe-fournier-viger.com/seqDBGen/

It is more simple than the IBM generator. But it can generate sequence with itemsets containing single items.

Best,
Philippe

Edited 1 time(s). Last edit at 10/30/2012 07:59AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Dvijesh88

Date: May 27, 2012 10:48PM

Hello,

I want to know one thing and it is related to IBM generator

if i write that i use dataset which size is c10-T2.5-S4-I1.25

what is means of S and I here
please give some brife information. i am confuse

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: May 28, 2012 04:11AM

Hello Dvijesh,

I've searched a little bit and they say that the meaning is :

D: number of sequences in the dataset
C: average number of itemsets per sequence
T: average number of items per itemset
S: average number of itemsets in potentially frequent sequences.
I: average size of itemsets in potentially frequent sequences
N: number of different items in the dataset

It is not very clear. But I think that S and I are about the frequent sequential patterns. It should mean the average number of itemsets in frequent sequential patterns and the average number of items in frequent sequential patterns.

Best,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Dvijesh88

Date: May 31, 2012 09:20PM

thats really make me confuse because before applying algorithm how can we say the S and I value sir...

thats why i confuse

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 01, 2012 09:16AM

Hi Dvijesh,

These options are parameters of the IBM Generator to generate synthetic datasets. They are NOT parameters of the sequential pattern mining algorithm like PrefixSpan. To see how to use the parameters of the IBM Generator you can use the parameter -help. It prints the description of the parameters:

-ncust number_of_customers_in_000s (default: 100)
  -slen avg_trans_per_customer (default: 10)
  -tlen avg_items_per_transaction (default: 2.5)
  -nitems number_of_different_items_in_000s (default: 10)
  -rept repetition-level (default: 0)

  -seq.npats number_of_seq_patterns (default: 5000)
  -seq.patlen avg_length_of_maximal_pattern (default: 4)
  -seq.corr correlation_between_patterns (default: 0.25)
  -seq.conf avg_confidence_in_a_rule (default: 0.75)

  -lit.npats number_of_patterns (default: 25000)
  -lit.patlen avg_length_of_maximal_pattern (default: 1.25)
  -lit.corr correlation_between_patterns (default: 0.25)
  -lit.conf avg_confidence_in_a_rule (default: 0.75)

  -fname <filename> (write to filename.data and filename.pat)
  -ascii (Write data in ASCII format; default: False)
  -version (to print out version info)

In particular, the parameters S and I should be specified by the two following parameters (in my opinion).

-seq.npats number_of_seq_patterns (default: 5000)
-seq.patlen avg_length_of_maximal_pattern (default: 4)

Hope that you understand what I mean.

Best,

Philippe

Edited 1 time(s). Last edit at 06/01/2012 09:19AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: weelian

Date: December 09, 2012 09:02AM

HI,

How can i generate the data using IBM Quest?
Is it necessary to install unix only can run the code?

Thanks

Weelian

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: December 09, 2012 10:40PM

Hi Weelian,

No it is not necessary to use Unix.

First download this file:

http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zip

It contains an .exe file to run it on Windows.

There some example of how to run it in the zip file. It is not very complicated.

Best,

Philippe

Options: Reply•Quote

BMS-WebView-1(Gazelle)

Posted by: Jing

Date: January 19, 2013 02:14AM

Hi Philippe,

We are interested in the "BMS-WebView-1" dataset, click-stream data originally from the webstore named Gazelle.

We applied our new data mining method to it and have found some interesting patterns beyond traditional sequential. In order to analyse and evaluate the mining results, we need to know the meaning of the numbers in the original dataset, e.g. 1st line:

10307 -1 10311 -1 12487 -1 -2

What specific web pages or URLs do the numbers 10307, 10311 or 12487 correspond to?

Is there a place to find out their meanings through a mapping/table somewhere?

Look fwd to your reply!

Jing

Options: Reply•Quote

Re: BMS-WebView-1(Gazelle)

Posted by: webmasterphilfv

Date: January 19, 2013 04:24AM

Hi Jing,

I don't know the meaning. I just know that they represents webpages from an e-commerce named Gazelle.com (that doesn't exist anymore). To get more information, you should check KDD CUP 2000 web page:

http://www.sigkdd.org/kddcup/index.php?section=2000&method=info

I just found that they offer some of the original files here (actually not all of them are on the webpage anymore):

http://www.sigkdd.org/kddcup/site/2000/files/KDDCup2000.zip

Maybe there are some tables inside. You could check this out.

Hope this helps and that your method provides good results!

Philippe

Edited 1 time(s). Last edit at 01/19/2013 04:39AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Majid

Date: January 26, 2013 09:20PM

Hi

Is there any problem with windows 7 (64 bit)?

I cannot run the exe file here.

Thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: January 26, 2013 09:26PM

Hi Majid,

It works with Win7 64 bits on my computer.

Maybe it did not work on your computer because you may not have run the .exe in the command prompt. Also, if you don't give parameters, the program will crash.

You can check the .bat file that i have included in the zip file. A bat file is a txt file. It include an example of how to run the program.

Also there is the readme file that give more details.

But the key is to use the command prompt to run the software.

Best,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 28, 2013 09:16AM

hi,
i want to ask u please tell me how could i run the spmf source code software in netbeans fpr frequent pattern mining. as i run the main test file of algorithm it always give an en unexpected error occured. should i have to install any other file to run it or some other method is required to run. also i want to ask do u have any code of classification based on association.if yes then please send me. i m in need of it.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 28, 2013 10:11AM

hi,
the code has now runned but when i click the ellipses to enter input file it gives an error message "java.lang.RuntimeException:Uncompilable source code- Erroneous sym type: pfv.spmf.gui.PathsManger.getinstance.getInputfilePath

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 29, 2013 05:25AM

This error means that NetBeans was not able to compile all the classes in the project. The problem is probably that you may have not installed the code properly in your NetBeans project. I would suggest to delete the project, create a new one, and try again to install the source code again.

The instructions for installing the code in Eclipse or Netbeans can be found here:
http://www.philippe-fournier-viger.com/spmf/how_to_install.txt

Philippe

Edited 1 time(s). Last edit at 06/29/2013 05:27AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 29, 2013 06:32AM

hi,
i found the reason why it gives such an error, the problem was the directory for the source package. i was checking the algorithms in the gui i havent found id3 algo? but it is in code but when i tried to run it through test file it give an exception

Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 4
at pfv.spmf.algorithms.classifiers.decisiontree.id3.AlgoID3.runAlgorithm(AlgoID3.java:83)
at pfv.spmf.test.MainTestID3.main(MainTestID3.java:23)

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 29, 2013 06:51AM

There is just 3 or 4 algorithms that are not in the GUI yet and ID3 is one of them.

When I run the test file on my computer, it works. Did you use the provided input file or another file?

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 29, 2013 07:52AM

i use the tennis.txt file

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 29, 2013 08:21AM

I have downloaded the source code and installed it in a new java project and run MainTestID3.txt and it works.

Maybe you have modified the tennis.txt file ?

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 29, 2013 08:37AM

no i dnt ok i again check it.
do u have any java code on Classification based on association. actually i am doing project on classification based on fp tree so i want ur help in this regard as im new in this field.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 29, 2013 09:25AM

No i don't have the code for classification based on association rules. I just have the implementation of the FPGrowth algorithm.

All the code that I have is on the SPMF website.

Philippe

Edited 2 time(s). Last edit at 06/29/2013 09:26AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ismah

Date: June 29, 2013 09:37AM

ok thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: July 11, 2013 08:25AM

hey thanks for this great code . i'm Omer study SE in UMT and im new to dm but i think it very interesting ,my project about sequential data mining using AproiriAll algorithm. in fact i have to use a format made by my fellow who working in per-processing tool for sequential data set which is look like this

1-3-4 4-3 9-4
4-2-4 5-3 4-5

your code gave me an idea about how to read this file but i stuck in counting the frequent of each items . could you gave sample of code on how to count the items using proper data structure .
thanks very much

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: July 19, 2013 07:32AM

Hi,

Sorry for the delay to answer you. I have been on vacation in Mexico and I did not had Internet for a week.

To count the support of single items, you need to read each sequence one by one in your file.

Then, to count the support of single items, I usually use a HashMap<Integer,Integer> where the key is an item and the value the support of the item. Each time that an item is read from a sequence, I increase the count for the item in the map. I also use a HashSet<Integer> to remember which items i have already counted so that I do not increase the support more than once if an item appear multiple times in the same sequence. The code would look like that:

HashMap<Integer, Integer> count = new HashMap<Integer, Integer>();

for each sequence...
   HashSet<Integer> alreadyCounted = new HashSet<Integer>();

   for each itemset
         for each item
              if the item is not in alreadyCounted then
                 increase the count of the item in the hashmap 
                 add the item to alreadyCounted.

An alternative way that is also efficient is to use an array instead of a hash map. If you know the number of items that you have in your dataset, it is a good idea to do that. For example, if you have only 100 items, then you can create an array int[100] where each position is used to store the frequency of the items 1,2 .... 100. I don't use this in my implementation because I don't know in advance how many items appear in each dataset and what is their ids.

Philippe

Edited 1 time(s). Last edit at 07/19/2013 08:39AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: July 19, 2013 12:12PM

Hi,
thank you very much for replied me and hope you did enjoy ur holiday smiling smiley

.i have been waiting for this long time ,everyday i check this forum almost 20 times grinning smiley

. i already used the hash map for counting the items and i could count each sequence as well as calculate frequent items but yet the problem of counting repeated data in the same row still there and now u mentioned the hash set and i will give it a try . now i reached to maximal phase which seem to me confusing . and also i see no different between aprioir all and normal aproioir here and i don't why ? i could use this code for both!!

here my code for counting, this for first Candidate items set C1 and correct me if em wrong . thanks u again very much indeed Sir

String mylLine;
BufferedReader Input = loadFile_Data();
if (nput.readLine().equals("#data" winking smiley

) {

// System.out.println("true" winking smiley

;// test
while ((myLine = myInput.readLine()) != null) {
// System.out.println("Next Line" winking smiley

; test
// read whole lines and divided into tokens
StringTokenizer st = new StringTokenizer(thisLine);
// System.out.println("tokens all "+st.countTokens());// test
// loop and get each token
while (st.hasMoreTokens()) {
// System.out.println("tokens "+st.nextToken());//test
// break each token into units
String val = st.nextToken();
String[] split = val.split("-" winking smiley

;
for (String value : split) {

// System.out.println(value);// test print all

// store each token into hashMap and count it's occurance
if (items_List.get(value) == null)
items_List.put(value, 1);
else
items_List.put(value,items_List.get(value) + 1);

}
// System.out.print(" next time " winking smiley

;
}
}

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: July 19, 2013 01:42PM

Hi,

Your code looks good. For more efficiency, you could store the result of :

Integer value = items_List.get(value);

in a variable.

Then like I said, you can add the HashSet<Integer> and it would be perfect. Hashset is very simple. You use the method .add() to add an item and contains() to check if the item is already in the hash set.

Best,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: July 20, 2013 02:16AM

Hi;
Ok thank you Sir I noted and i will apply this on my project and if i stuck again i will drop you message here . thank you for your help smiling smiley

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: August 01, 2013 06:40AM

Hi Sir,

I have seccssfully implemented aprioirAll algorithm and i could get the maximal items but i have problem with association rule generation i use method call assoction rule but as rule generation rules we should have an items in both side in the rule been generated but mine but it in both side and em stuck how to seprate, im planning to use loop but this will take time and effect my algorithm and make it bit slow do u have any idea that can help here my method code .

public void assocation_Rule_Generation(){
		try{
		System.out.println("***************Assocations Rule Found *********************\n";
	
		
		for(int i=0;i<maximal.size();i++){
			String first= maximal.get(i);
			double forFisrt = maximal_list.get(first);// this were we store all the items the stisfy the minsup in the file 
			
			for(int x=0;x<maximal.size();x++){
				String second = maximal.get(x);
				
				if(!first.equals(second)){
					double forSecond = maximal_list.get(second);
				
					double fconfidenst =(forSecond/forFisrt)*100;
					 if(fconfidenst >=conf){ 
						
				System.out.printf("%-8s ==> %-8s conf %.2f",first,second,fconfidenst);
					 System.out.print("%";
					 System.out.println();
					 }
					 
				}
			}
			
		}
		
		
	}catch(Exception e){
		System.out.println(e);
		e.printStackTrace();
	}
	}

i could only avoid the the items it's self from beaning in both side

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Suraya Fadeela

Date: July 26, 2013 08:46PM

Hi Phillipe,

I must commend you for the wonderful work you are doing here in this forum.

I am currently working on a project and at this stage I will like to mine maximal sequential patterns from my dataset. Each item has three dimensions(x,y,y). I wish to use your prefixspan implementation, but I am having difficulty in doing so because of the nature of the data. I will appreciate any suggestion from you.

Fadeela

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: July 27, 2013 05:17AM

Hi,

You could have a look at how to mine maximal sequential patterns. I do not have implementations of these algorithms. But there exist some. Maybe it could give you some ideas.

You say that items have three dimensions. What kind of dimensions? Integers ? nominal values? If you just have a small set of values for each dimensions, you could consider transforming each possible value as an item. For example, i(x,y,z) could become three items: ix, iy, iz.

But in any case, I think that you will have to write a custom algorithm for that (that could be based on PrefixSpan). Using PrefixSpan could be a good idea because it is easy to extend.

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Suraya Fadeela

Date: July 27, 2013 06:59AM

Hi Philippe,

Thanks for the response.

Each dimension (integers) in my dataset represents a distinct attribute of the data, but together they make up an item. This means instead of an itemset, I rather have an item, each item being (x,y,z). I can therefore not transform each dimension as an item. I am looking at mining sequential patterns from this combination of items.

In using your implementation for prefixspan I will like to get a clear picture of how to get my itemsets.

I will appreciate some ideas.

Regards
Fadeela

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: shruti

Date: January 12, 2014 05:40PM

hii
i have downloaded the source code and i am executing in eclipse (apriori).but in ca.pfv.spmf in src folder it's showing some errors. please help me in rectifying those errors so that i can see the output.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: January 13, 2014 02:35PM

Hello,

Did you follow the installation instructions?

It is important to follow them carefully. I suggest to delete the project and create a new one and to follow them carefully.

Normally, it should work. If you try it again and it still does not work, then post the error so that I can see what is the problem.

Best,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: anju

Date: March 17, 2014 07:00PM

CAN ANY ONE TELL ME HOW TO IMPLEMENT iNCsPAN ALGO IN WEKA TOOL....SEQUENCE MINING iMPLEMENTATION IS POSSIBLE WITH THE HELP OF WEKA TOOL???/

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: March 17, 2014 07:07PM

To my knowledge, it does not offer any algorithm for manipulating sequences (it is used mainly for relational data).

If you want some code for sequential pattern mining, you could check my SPMF Java open-source data mining library, which is specialized for mining sequential data.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: ANJU

Date: March 17, 2014 07:52PM

WITH THE HELP OF SPMF FRAMEWORK CAN I CREATE MY OWN ALGORITHM CODING????

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: March 18, 2014 03:30AM

Of course.

First, you would need to go to the download page to get the source code.

Then, you could read the installation instruction.

Then, you could read the developer's guide on the same page.

It is very simple to add a new algorithm to SPMF.

A few people have done it already.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: anju

Date: March 18, 2014 07:38AM

thnq...
will try it.....

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: anju

Date: March 18, 2014 09:51AM

can anyone give me source code for IncSpan algorithm of sequential pattern mining???

Options: Reply•Quote

utility pattern mining datasets

Posted by: jaisri

Date: April 02, 2014 08:27PM

want dataset for utility based datamining and also profit dataset details

Options: Reply•Quote

Re: utility pattern mining datasets

Posted by: webmasterphilfv

Date: April 05, 2014 05:56PM

Datasets for utility itemset mining can be found here:

http://www.philippe-fournier-viger.com/spmf/index.php?link=datasets.php

at the bottom of the page.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: zeynab

Date: April 07, 2014 07:50AM

HI I NEED C# CODE THIS PAPER PLZ HELP ME AND REPLY ME AND IMPELEMENTATION WITH C#

A Novel Method for Privacy Preserving in
Association Rule Mining Based on Genetic
Algorithms
Mohammad Naderi Dehkordi
Ph.D Student, Science & Research Branch, Islamic Azad University (IAU)
Department of Computer Engineering, Tehran, Iran
E-mail: naderi@iaun.ac.ir

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: malsoru

Date: May 16, 2014 06:11PM

Q. How to draw a graph/charts for various values such as min_sup vs execution time of a sequential pattern mining.
I am working with Sequential Pattern Mining algorithms such as GSP, SPADE, SPAM, PrefixSpan, CloSpan and datasets such as Kosarak, BMS, Leviathan, Snake, SIGN, and FIFA with the help of SPMF and i got values for several parameters such as MAX. PATTERNS LENGTH, MIN_SUP, TOTAL TIME, FREQUENT SEQUENCES COUNT, MAX. MEMORY (mb). Now I want to draw the graphs/charts, then how to draw the graphs to compare the results above said algorithms.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: May 16, 2014 06:54PM

To do that, I use Microsoft Excel. You need to enter the results (execution time, etc.) into Excel and then you can draw the charts using the tools from Excel.

I cannot help you how to use Excel, but I can send you a sample Excel file that I use with some charts that have been already made to give you an idea of how to do it. You send me an e-mail at philippe.fv AT gmail.com and i'll send it to you.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: malsoru

Date: May 18, 2014 07:32PM

Thanks for the response.

I am good at Microsoft Excel to draw the charts using the tools from Excel,
but i need some open source tools to draw graphs, such as MATLAB ect.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: May 19, 2014 12:58AM

An open-source replacement of MATLAB is Octave. It works with the same language.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: rich ryan

Date: May 22, 2014 06:48AM

msnbc downloads of stream data comes up with internet explorer errors? is there a problem with the enternet explorer, msnbc data streams, or my compter?

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: May 22, 2014 05:42PM

http://www.philippe-fournier-viger.com/spmf/datasets/MSNBC.txt

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Harshad

Date: May 29, 2014 01:01AM

Hi,

I am new to coding, can anyone help me with the same algorithm but with the code in C#.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Hazhir

Date: July 07, 2014 07:44AM

Hi,
I don't understand the (sequence) method in the code for reading kosarak dataset. Can anyone help me to know where that sequence came from and how can I call this in a main method.?I am very new in programming so any help is appreciated

thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: July 07, 2014 10:05AM

Hi, the code above assume that a class Sequence and a class Itemset are also defined. Actually, this code was taken from an early version of the SPMF open source data mining library where a Sequence was implemented as a List of Itemsets and an Itemset is implemented as a List of Integer. The code of the classes Sequence and Itemset is not provided, but the goal here is just to give the main idea for reading the dataset. You could create your own classes Sequence and Itemset.

But to make it simple, if you just want to print what is read from the file you could put this code in a main function:

String filepath = "kosarak.txt";  // replace this with the input file path
String thisLine;
BufferedReader myInput = null;
try {
        int seqID = 0;
	FileInputStream fin = new FileInputStream(new File(filepath));
	myInput = new BufferedReader(new InputStreamReader(fin));
	while ((thisLine = myInput.readLine()) != null) {
                System.out.println("Sequence #" + seqID);
		// for each sequence
		String[] split = thisLine.split(" " ) ;
		for (String value : split) {
			Integer item = Integer.parseInt(value) ;
			System.out.print(item + " " );
		}
                System.out.println();
                seqID++;
		
	}
} catch (Exception e) {
	e.printStackTrace() ;
} finally {
	if (myInput != null) {
		myInput.close() ;
	}
}

where filepath is a String variable containing the input file path (you may need to edit it).

Best,

Edited 1 time(s). Last edit at 07/07/2014 10:06AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: September 03, 2014 02:08AM

Hi Phillipe,

I was wending if there any eclat extension for mining sequential for mining sequential datasest or any Eclat like extension for sequential dataset. I am planing to work on this but not sure if it already been done

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: September 03, 2014 04:57AM

Zaki proposed an Eclat based algorithm for mining sequential patterns in sequence database. It is named SPADE.

You may have a look at that paper. It is an interesting paper that is not very hard to read and there is an implementation in SPMF, as well as an implementation of CM-SPADE, an extension of SPADE, that is faster.

The idea of using a vertical database as in Eclat with a depth-first search is very powerful. In my opinion, it could be adapted to several pattern mining problems. For example, the current best algorithm for high utility itemset mining (HUI-Miner and the FHM extension) are inspired by Eclat.

So, it can be a good topic to design some algorithms inspired by features of Eclat for some current or new data mining problems. For sequential dataset, there is many possibilities for designing new Eclat-based algorithm. For example, why not trying to design an Eclat based algorithm for mining seq. patterns with time constraints ? (i did not check if someone has done it, this is just an example)

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: September 03, 2014 09:27AM

Hey Phillipe,

Thank you very much, This really give me an idea and inspired me to go further in eclat. I will look into the SPADE and CM-SPADE and see if there any possibilities of enhancement that I could do, also I will consider your suggestion for time constraints if no one have done so, it looks interesting. I will advise if anything new.

Cheers
OMER

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: September 03, 2014 09:52AM

Hi Omer,

Yes, and beside time constraints, you could perform any combination of two problems to create a new problem. For example:

sequential pattern mining + incremental pattern mining = incremental seq. pattern mining

sequential pattern mining + uncertain data mining = uncertain seq. pattern mining

etc.

It is easy to create a new topic by combining topic. Actually, the topic above already exists, but they are examples, and perhaps that no eclat-based algorithm have been used in these topics.

For time constraints, there exists some algorithms based on PrefixSpan (one of them is named Hirate & Yamana and is offered in SPMF) and some other algorithms perhaps. In that case, if you design a new algorithms, you may want to compare the performance with the existing algorithms.

So in any such topics, you may find some way to apply an Eclat based algorithm.

Best,

Philippe

Edited 1 time(s). Last edit at 09/03/2014 09:54AM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: September 03, 2014 03:04PM

Thank you Philippe, It seems there are many things to look at yet each of which looks interesting. I was thinking of this incremental mining but I used different term I called Updatable eclat which can update the frequents items based on the a new dataset given with no need to start from scratch. Hopefully, at the end I will come out with something useful and interesting too.

Thank you again

Cheers,

OMER

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Omer

Date: October 13, 2014 01:48AM

Hi Phillipe ,

It seems I will go for squ pattern maining with time constrains based on eclat, but I am wondering about the dataset that will be used to test the proposed algo. should I create my on dataset format or is there exciting dataset with time stamp that I can use in my testing. I have already developed a seq dataset generator yet I use my won format.

Cheers

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: October 13, 2014 04:59AM

It is better to use some real datasets too.

But I don't have any real dataset with timestamps.

For the real dataset, you may use the sequence datasets available on the SPMF website such as BMS, etc. But they don't have timestamps. So you could generate the timestamps for these datasets randomly maybe. Thus, you would have real data with synthetic timestamps. There is a tool that i have added recently toSPMF to add consecutive timestamps to a dataset (example #86). This may be helpful perhaps.

Otherwise, you could generate your own dataset. You could use for example the log from a web server. For example, you can the original web logs of the FIFA datasets here: http://ita.ee.lbl.gov/html/contrib/WorldCup.html You would need to convert them to a sequence database and you could use the real timestamps of the HTTP requests.

Besides, you can use synthetic sequences generated by your generator to evaluate your algorithm.

Hope this helps.

Best,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: amir

Date: September 17, 2014 10:24PM

Hi Phillipe
I want to implement the Sequential pattern mining with genetic algorithm
Please help me
Give me solution or code.
thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: September 18, 2014 08:17AM

Hi Amir,

I don't think that I have time to implement it. But is there a name for this algorithm? Can you post a link to the PDF file of the paper that you are interested in?

I don't think that I can implement it. But I'm curious and maybe I could give you my opinion about how hard it would be to implement it if you want to do it.

Best,

Options: Reply•Quote

spade and gsp

Posted by: chanchl

Date: December 05, 2014 05:58AM

I have to implement gsp and spade algorithm..plz help me to get code in java

Options: Reply•Quote

Re: spade and gsp

Posted by: webmasterphilfv

Date: December 07, 2014 06:23AM

You can get it here: http://www.philippe-fournier-viger.com/spmf/

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Clark Danton

Date: March 14, 2015 09:24AM

Dear Phillipe,
Thanks very much for your forum, I have found many interesting things from your topics. I am researching data mining and I am focusing in how to hide high utility sequential pattern, but I do not find any database Which has information about item with it's quantity in a sequence like the picture I enclose. I use http://www.philippe-fournier-viger.com/spmf/dataset/IBM_Quest_data_generator.zip , but it does not include this quantity. So Please help me to solve my problem. Can you give me a database for mining high utility sequential pattern. Thank you so much!
Best,

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Clark Danton

Date: March 14, 2015 09:30AM

Clark Danton Wrote:
-------------------------------------------------------
> Dear Phillipe,
> Thanks very much for your forum, I have found many
> interesting things from your topics. I am
> researching data mining and I am focusing in how
> to hide high utility sequential pattern, but I do
> not find any database Which has information about
> item with it's quantity in a sequence like the
> picture I enclose. I use
> http://www.philippe-fournier-viger.com/spmf/datase
> t/IBM_Quest_data_generator.zip , but it does not
> include this quantity. So Please help me to solve
> my problem. Can you give me a database for mining
> high utility sequential pattern. Thank you so
> much!
> Best,

Options: Reply•Quote

transaction datasets

Posted by: rasha

Date: October 13, 2015 04:55AM

Hi,can anyone tell me how to use IBM Generator because i need a transaction data set

for my research about association rules?

Options: Reply•Quote

transaction datasets

Posted by: rasha

Date: October 13, 2015 04:56AM

Hi,can anyone tell me how to use IBM Generator because i need a transaction data set

for my research about association rules ?

Options: Reply•Quote

Re: transaction datasets

Posted by: webmasterphilfv

Date: October 13, 2015 05:18AM

I think that there is some documentation included with the IBM generator. Otherwise, you may try the generator provided in the SPMF library (see example #114 of the documentation on the spmf website). This generator is more simple than the IBM generator, but it is easier to use.

Options: Reply•Quote

problem GAP/SPADE

Posted by: Fernando Levano

Date: November 09, 2015 02:30PM

Hi, I'm finding a dataset for SPADE and GSP algorithm for my thesis, I need a data with timestamp, sid and transaction. Help me

Options: Reply•Quote

Re: problem GAP/SPADE

Posted by: Phil

Date: November 09, 2015 04:23PM

If you need data with timestamps, you could use server logs and convert them to sequences. This would result in sequences with timestamps.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: john

Date: November 20, 2015 12:52AM

hi
can you execute below code?
Seq_data_generator seq –ncust 0.1 –ascii -fname ds1
i have error
if you can
please send to my mail
pashe3d@gmail.com

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Vasko

Date: November 28, 2015 05:47AM

Hello,

My first question is what does "lit" stand for (i.e. you have "seq", "lit", "tax" winking smiley

?

My second question is:

I am trying to generate transactional datasets using lit but when I run:
seq_data_generator lit -ntrans 0.01 -ascii -fname test1

all of the transactions are on a SINGLE LINE, why does it do this?
I tried "seq" and the transactions are on separate lines but I don't want sequential transactions as I am testing on Apriori?

Can someone help me?

Thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Vasko

Date: November 28, 2015 05:59AM

P.S.

Also if i am trying to generate a transactional data set for association mining (focusing on frequent itemsets mining not on rule generation) for the dataset described as: D2016K.T10.I2

how would it translate to the parameters of seq_data_generator?

seq_data_generator lit ...

Do I keep some of the attributes as default (e.g. -npats, -nitems) as they are not specified?

The above dataset description(i.e. D2016K.T10.I2) is taken from paper:

Parallel Mining of Association Rules:
Design, Implementation and Experience by Rakesh Agrawal.

Thank you.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: LP

Date: February 29, 2016 09:38PM

Hi,
My project is mininmg probabilistically frequent sequencial patterns in large uncertain databases. I collected a dataset on temperature sensor reading, but looking for more datassets which are uncertain. Could you Please help me finding them. If u have any idea in finding sequential patterns for such datasets kindly let me know. Post the code also, if it is available with anyone of you.

Thank you

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: vydehi

Date: February 29, 2016 09:46PM

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Asim

Date: August 01, 2016 12:29AM

plz prouvide me datasts of snake thanks

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: August 01, 2016 03:11AM

Send me an e-mail at philfv8 AT yahoo.com and I can give it to you.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: rhun

Date: August 08, 2016 08:40AM

I am using IBM Quest data generator. problem is that if I use command like this bin/seq_data_generator seq -ncust 0.1 -ascii -fname sample, then in sample.pat file it specifies that number of customers are 100, but in sample.data file number of record are less then 100 . this is case for all values when I specify -ncust greater than 10.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: lhphuoc

Date: December 07, 2016 10:27AM

Please, help me!

How to convert orginal Sign dataset to Spmf format?

Thank so much!

Edited 1 time(s). Last edit at 12/07/2016 10:30AM by lhphuoc.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: December 07, 2016 04:08PM

I wrote some code for that about six years ago to convert Sign to the SPMF format.... I think I don't know where that code is anymore. But it should be quite simple to write the program again. You just need to read the file Sign in memory, assign numbers to each element from the Sign format, and then write another file as output. If you know how to do programming and read/write files, you could easily write a program for that.

Edited 1 time(s). Last edit at 12/07/2016 04:15PM by webmasterphilfv.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: lhphuoc

Date: December 08, 2016 03:19AM

Thank alot, Phillipe.

I will try write a program follow your intruction.

Thank you.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Mikhail

Date: January 06, 2017 11:48PM

Dear Phillipe,

I use data generator from http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zip with the following options:
lit -ascii (e.g. seq_data_generator lit -ntrans 1 -tlen 10 -nitems 0.1 -ascii).

I'm wait for one-per-line transactions in the output file but get all the transaction in single line.

Could you please prompt how can I get ascii formatted transaction database with one-per-line transaction in it?

Regards, Mikhail

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: January 09, 2017 11:42PM

Hello Mikhail,

I have tried but cannot find the reason. Actually, I did not use this program for a few years so I do not really remember how to use it, and it was not written by me. If you cannot find how to use it, you can use the dataset generator offered in the SPMF library. It is easier to use.

Best regards,

Philippe

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: Hritwik Dutta

Date: June 27, 2018 06:52AM

Dear Sir,

I had downloaded the IBM Quest data generator from this link:
http://www.philippe-fournier-viger.com/spmf/datasets/IBM_Quest_data_generator.zip

In this folder, there was a README.txt file which contains information about how to generate datasets according to our requirements. There was also an example dataset generated which was:

9 1 2479 2 2154 5477 4 440 2276 6036 9838 4 6639 7926 9054 9748 1 1247 1 1992 1 2535 3 5095 7744 9337 6 2343 4737 7188 8518 9241 9486
9 3 765 1542 7203 3 5854 7309 8827 1 5993 4 991 4956 6376 7993 3 606 1798 8964 3 606 1091 2787 2 4023 5242 3 2451 6392 8259 5 703 3355 5892 7169 9239
6 1 9033 3 1845 7713 8778 3 1285 1705 7890 3 4049 6908 7443 2 765 1739 2 3627 9693
1 2 269 4701
2 2 6273 7655 1 2654
6 1 149 2 3533 9471 1 5758 1 1491 1 5024 1 3812
8 3 3404 4764 6487 3 5210 7637 7771 3 1405 5501 9708 3 3572 3875 7705 3 2709 4167 5713 7 886 2870 6918 7963 8571 9125 9795 2 723 6970 2 625 5975
4 1 7851 1 2053 2 5826 8595 1 1734
6 2 7213 7771 1 5510 2 4652 9373 1 9928 2 6771 7774 2 3113 8769
5 2 765 6075 1 5735 1 3541 1 2715 1 2104

I just wanted to confirm the number of transactions in this dataset. Is it 56?
Please correct me if I am wrong.

Thanking you in anticipation.

Yours sincerely,
Hritwik.

Options: Reply•Quote

Re: Sequential pattern mining datasets

Posted by: webmasterphilfv

Date: June 27, 2018 07:22AM

Hello,

If you clicked on "GENERATE_DATASET.bat" to generate the dataset, it should generate a sequence database, where each line is a sequence.

But I did not use that IBM generator for a long time, so I do not remember how it works. There is some database generators that are perhaps easier to use in my SPMF library.

Best regards,

Options: Reply•Quote