Hi,
Yes, the default format is to use numbers. The reason is for efficiency.
However, SPMF also support an alternative which is to use the ARFF format which allows to use strings.
The ARFF format looks like that:
@relation LCCvsLCSH
@attribute LCC string
@attribute LCSH string
@data
AG5, 'Encyclopedias and dictionaries.;Twentieth century.'
AS262, 'Science -- Soviet Union -- History.'
AE5, 'Encyclopedias and dictionaries.'
AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Phases.'
AS281, 'Astronomy, Assyro-Babylonian.;Moon -- Tables.'
For example, this dataset has 5 transactions with two attributes, which are strings.
If you want some sample datasets in ARFF format, there are also a few on the "dataset" page of the SPMF website and a link to the formal ARFF specification..
If you use the graphical user interface of SPMF, the ARFF format is supported for all itemset and association rule mining algorithms and you don't need to do anything special. The graphical interface will detect that a file is in ARFF format.
If you want to use the ARFF format in the source code version of SPMF, you would need to look at example #62 in the documentation for more information.
Best,
Philippe