I'm testing an educational website and I'm trying to see if it helps students. The site is for a plant identification class and it shows pictures of plants the students should know. Students can practice by typing in the names online before they go into class and take a real quiz.
The problem is that overall usage alone doesn't seem to predict success. Sometimes students use the site and then get the plant right on the quizzes. Other times they use the site and then still get it wrong. So I'd like to compare the specific web usage that leads to those two outcomes and see if there's a difference in patterns.
From the SQL server I can get a timestamped database showing what each student did online. My database looks like this:
time student species photo action
1:15 2 A 5 right answer
2:46 2 A 3 wrong answer
5:05 2 A 2 wrong answer
6:19 2 A 5 right answer
5:47 7 A 1 right answer
5:58 7 A 4 right answer
6:10 7 A 2 wrong answer
6:18 7 A 3 wrong answer
If Student 2 and Student 7 both failed to identify Species A on the test, I want the computer to tell me what they both have in common. In this case, time of studying doesn't seem to make a difference but they both looked at Species A 4 times, they both gave two right and two wrong answers, and they both gave wrong answers to photos 2 & 3.
Can SPMF look at those data and detect the patterns? How would I format the data and what algorithm would I use? My first instinct is to combine all the lines for Student 2 into a long string and combine all the lines for Student 7 into one long string. Am I on the right track?