The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
Where to start ?
Posted by: Freddy Mercury
Date: May 15, 2013 07:59AM

I would like to use data mining techniques to help with the following:

I have a set of 5-10 people picked out of a pool of 100 assigned to a (repetitive) task, which can be either remote or local. The task has to complete within 300 days.

Past history for about 500 execution of the task is available, listing the composition of the group, whether the task was local or remote, and the acutal days it took to complete.

I would like to be able to determine which set of people leads to the completion of the task within the 300 days deadline and which set does not.

I would also like to be able to determine whether task that are remote are generally completed faster than task that are local and the relationship with the people who took part in its execution.

Possible? Which technique should I be looking at? Which tool?

Any help greatly appreciated.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: March 24, 2014 02:03PM

I would like to apply an incremental sequence extraction algorithm.
I am using PostgreSQL database.
What can i do? How can i start?

Options: ReplyQuote
Re: Where to start ?
Date: March 24, 2014 04:51PM

For some algorithms, you may need to export your database as a text file.

Also, you would need to find an implementation of the algorithm that you are looking for or implement it.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 07, 2014 03:34AM

Thanks,
I am looking for the ISE source code, and i don't know how to interpret output files with SPMF.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 07, 2014 03:36AM

What can i do to get the executable version of SPMF.

Options: ReplyQuote
Re: Where to start ?
Date: April 07, 2014 05:39AM

Hello,

You can go to the download page of the website:

http://www.philippe-fournier-viger.com/spmf/index.php?link=download.php

Then, you can download spmf.jar which is a jar file.

Then you can use this jar file as a command line tool or as a graphical interface.

How? You need to read the installation instructions on the download page that explain how to use it.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 07, 2014 07:57AM

Thank you sir, but what about the IsE algo source code?

Options: ReplyQuote
Re: Where to start ?
Date: April 07, 2014 09:25AM

I don't know this algorithm.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 07, 2014 10:53PM

can i get an algorithm that you know, for incremental sequence mining; because the SPMF software doesn't offer the incremental approach.

Options: ReplyQuote
Re: Where to start ?
Date: April 08, 2014 03:48AM

The closest thing to what you are asking in SPMF is the Itemset-Tree. It is an incremental tree structure that can answer queries about itemsets and association rules. You can update it incrementally.

For incremental sequence mining, there is no algorithm for this in SPMF currently.



Edited 1 time(s). Last edit at 04/08/2014 04:26AM by webmasterphilfv.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 09, 2014 02:21AM

thanks,
I exported my database as text file. Now i want to convert it in spmf format but i don't know what to use as database input format for converting. Is it possible to use only CSV_INTEGER? What is the min processor capacity required for mining very large databases?

Options: ReplyQuote
Re: Where to start ?
Date: April 10, 2014 08:22AM

"i don't know what to use as database input format for converting"

I'm not sure what you mean by this. If you mean that you want to convert from CSV_integer to SPMF format for a transaction database, then yes, there is a tool that is offered for that in SPMF (see the documentation).

"What is the min processor capacity required for mining very large databases?"

It depends which algorithms you are using, what parameters you are using for the algorithm and what kind of data you have as input.

Usually, when parameters are set lower, the number of patterns can increase exponentially in some cases. If your data has very long transactions or sequences or as few items repeated multiple times, or many similar transactions/sequences it will also increase the number of patterns and thus the memory and execution time required. Also, some algorithm are faster than some others.

So the answer to this question depends on (1) the algorithm, (2) the data, (3) the parameters.

Options: ReplyQuote
Re: Where to start ?
Posted by: calebk
Date: April 11, 2014 02:27AM

"i don't know what to use as input format for converting"
I have a transaction database, i have exported it as a txt file, when i tried to convert it to spmf format with the tool offered in order to mine, the output file is just opened without any thing.
How can i prepare my transaction database ? Because in the examples in the documentation, the algorithms are applied to data already primed. How can i interpret the results after running a sequence mining algorithm?

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.