Hi Stefan,
Did not saw your message. Thanks for using SPMF. My answers are below.
> (1) What would be an feasible and working
> algorithm? I searched the site but most are made
> for sequences of integers, and my actions have
> string values.
Hi, yes, you can use SPMF with sequences of strings. To define the string values corresponding to integers, you can use that format:
@CONVERTED_FROM_TEXT
@ITEM=1=apple
@ITEM=2=orange
@ITEM=3=tomato
@ITEM=4=milk
@ITEM=5=bread
@ITEM=6=noodle
@ITEM=7=rice
@ITEM=-1=|
1 -1 1 2 3 -1 1 3 -1 4 -1 3 6 -1 -2
1 4 -1 3 -1 2 3 -1 1 5 -1 -2
5 6 -1 1 2 -1 4 6 -1 3 -1 2 -1 -2
5 -1 7 -1 1 6 -1 3 -1 2 -1 3 -1 -2
That format is explained in the documentation of the sequential pattern mining algorithms offered in SPMF (e.g.
CM-SPAM). You can try various algorithms. Some of the algorithms offered in SPMF like CM-SPAM will also let you set constraints such as minimum and maximum gap between itemsets in sequential patterns. There are also various types of patterns such as closed, maximal sequential patterns, compressing sequential patterns, statistically significant sequential patterns (using Skopus), etc. And you can also try the sequential rule mining algorithms, which also use that text format.
This format works for the command line or GUI of SPMF. If you want to use it in the source code, it is also possible but a bit more complicated.
>
> (2) Is there an algorithm that finds sequential
> patterns in exactly one big sequence [71518,
> 50376, 71400, ... , 43022]? This could be
> interesting if calculating it for all the 180
> sequences is to computationally expensive.
For a single sequence, we call that "episode mining" instead of sequential pattern mining. There are a few algorithms in SPMF that can deal with a single sequence. Besides the episode mining alorithms, PFPM can be used to find periodic patterns in a single sequence.
>
> (3) when i tried altering the pandas dataframe
> (which is csv originally) to put into the spmf
> tool, i could not find a way to get it into a form
> the spmf accepts. How could i alter my dataset to
> work in this tool? is there a way to transform a
> normal .csv file to a suited format?
Because different users have different needs, there is no tool to convert all formats to the SPMF format. But typically, it is a few lines of code to do that.
>
> (4) is there a more efficient way to store this
> dataframe without having it bloated with Nan's?
>
> I have some more questions but maybe some
> suggestions for these will already get me in the
> right direction!
I don't use Python so I cannot provide much help about that. But there is someone who defined a Python wrapper to use the VMSP lgorithm from SPMF to mine the maximal sequential patterns in Python:
https://github.com/fandu/maximal-sequential-patterns-miningMaybe that if you look at the code, it would give you some idea about how to use it from Python.
But you can always just call SPMF from the command line with a text file.
Best regards,
Philippe
Edited 4 time(s). Last edit at 04/11/2019 07:04AM by webmasterphilfv.