Posted by: user
Date: March 10, 2022

First, thanks for your informative website for the data mining community.
I would like to ask you about the best method to transform datasets from UCI to be usable for FIM methods.

Date: March 10, 2022


On UCI there are many datasets. Many of them have different formats. So the best way to convert datasets depends on the format.

To convert a dataset for FIM, you need to think what will be the transactions and what will be the items. For example, if in a dataset the data is numerical, then you may have to discretize it to obtain items. It all depends what you want to do.

As for converting the dataset, I think the best way is to write a small program that read the file and write another file. This just requires simple programming skills (read/write files). When I convert a dataset, I write a program in Java but you could use any language like Python etc.

Posted by: user
Date: March 23, 2022

You are right, it is easy to write and read. I would appreciate it if you would share your Java program as an example for converting UCI to FIM for a specific dataset. Your example will encourage many, including me, to convert more datasets.

Re: UCI repository datasets transformation
Date: March 23, 2022

Here is some piece of Java code that I use for converting a CSV file into the another format

The input is like this:


The output is like this:

1 2 3 4
5 6 7 8
5 6 7
1 2 3

The Java code:

import java.io.BufferedReader;
import java.io.BufferedWriter;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileWriter;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;

public class Example {

	public static void main(String[] args) throws IOException {
		BufferedReader myInput = null;
		try {
			String input = "test.csv";
			String output = "output.txt";
			// we create an object for writing the output file
			BufferedWriter writer = new BufferedWriter(new FileWriter(output)); 
			// Objects to read the file
			FileInputStream fin = new FileInputStream(new File(input));
			myInput = new BufferedReader(new InputStreamReader(fin));
			int count = 0; // to count the number of line

			String thisLine; // variable to read a line
			// we read the file line by line until the end of the file
			while ((thisLine = myInput.readLine()) != null) {
				// if not the first line, we create a new line
				if(count !=0){
					writer.newLine(); // create new line
				// we split the line according to spaces
				String[] split = thisLine.split(","winking smiley;
				// we use a set to store the values to avoid duplicates
				// because they are not allowed in a transaction
				Set<Integer> values = new HashSet<Integer>();
				for(int i=0; i< split.length; i++){
					values.add(Integer.parseInt(split ) )  ;
				// sort the transaction in lexical order
				List<Integer> listValues = new ArrayList<Integer>(values);
				// for each item, we will output them
				for (int i=0; i<listValues.size(); i++) {
					if(i != listValues.size() -1){
						// if not the last item
						// write the item with an itemset separator
						writer.write(listValues.get(i) + " "  )  ;   
						// if the last item
						// write the item
						writer.write(listValues.get(i) + "" )  ;   
				count++; // increase the number of line
			// close the output file
		} catch (Exception e) {
		} finally {
			if (myInput != null) {

Basically, I create two objects: one object for reading a file, and one object for writing a file. Then I read line by line the inpput file, and split into tokens according to ",". Then, I write the lines in the output file in the other format.

Depending on the input format, it could be more complicated than this. But here, it is very simple.
