Re: CPT+ Scoring

The Data Mining Forum

IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php

Goto Topic: Previous•Next

Goto: Forum List•Message List•New Topic•Search•Log In•Print View

CPT+ Scoring

Posted by: lv1984

Date: March 15, 2018 03:20AM

I'm testing the CPT+ but I can't understand how to interpret the scoring.
Is it already normalized?
What's the min and the max values for the scoring?
Can I already interpret it as a probability or it must be normalized?

Options: Reply•Quote

Re: CPT+ Scoring

Posted by: Luis

Date: March 20, 2018 07:04AM

Hey lv1984,

no, the scores are not normalized by default. If you want to normalize them by yourself, you could do it in the CountTable::getBestSequence() method:

//Filling a sequence with the best |count| items
Sequence seq = new Sequence(-1);
sd.normalize();// Implement this method in the ScoreDistribution class
List<Integer> bestItems = sd.getBest(1.002);

However the scores do not represent real proportions, because of the multiplication of the individual subscores in the CountTable::push() method.
You would have to rewrite the score system if you are interested in real proportional probabilities.

Disclaimer: I am just a student who worked with this algorithm for half a year, so I can not guarantee correctness winking smiley

Best regards,
Luis

Options: Reply•Quote

Re: CPT+ Scoring

Posted by: webmasterphilfv

Date: March 24, 2018 07:03AM

Thanks for answering, Luis :-)

Yes, the scores are not normalized in CPT+. The score for a prediction is the sum of its score for all the sequences that are used to make that prediction. Thus, the sum can be greater than 1. Beides, it cannot be negative.

Yes, the scoring system could be replaced by something else. When designing CPT/CPT+, my student Ted actually tried different scoring systems, and the one provided in CPT+ is the one that we found to work the best on our datasets. But maybe that other scoring systems are better or have other advantages. We found that it was more simple to have some scores that are not normalized.

Best regards,

Philippe

Edited 1 time(s). Last edit at 03/24/2018 07:04AM by webmasterphilfv.

Options: Reply•Quote