The Data Mining Forum                             open-source data mining software data mining conferences Data Science for Social and Behavioral Analytics DSSBA 2022 data science journal
IMPORTANT: This is the old Data Mining forum.
I keep it online so that you can read the old messages.

Please post your new messages in the new forum: https://forum2.philippe-fournier-viger.com/index.php
 
why are there two different trees for the same data set
Posted by: user
Date: February 21, 2022 05:25AM

I am working on Pima Indians Diabetes Database in Weka. I noticed that for decision tree J48 the tree is smaller as compared to the Random Tree. I am unable to understand why it is different when the data set is same?

Options: ReplyQuote
Re: why are there two different trees for the same data set
Date: February 21, 2022 04:23PM

Hi,

There are different algorithms to build decision trees. A typical algorithm for building decision trees will for example build a tree from the top to the bottom, node by node. To decide which attribute to use in a node, the algorithm will use some criteria to compare the attributes.
There exists various criteria like the GINI, information gain, etc.

For example, the ID3 algorithm will use the entropy, while C4.5 will use the GINI measure. If two algorithms dont use the same criteria to select attributes and build trees, then the result can be different.

Besides, there can be other techniques use by some algorithm like to prune the trees etc.

I am not sure what is used by RandomForest and J.48, as I did not read these papers recently, but it is likely different So it is quite normal that they dont have the same output.

Best regards,

Options: ReplyQuote
Re: why are there two different trees for the same data set
Posted by: user
Date: February 21, 2022 10:59PM

Thank you for the detailed response. I will look into it further.

For a data set of two classes verified and not_verified, I have the following 'detailed accuracy by class'

TP Rate FP Rate Precision Recall F-Measure MCC ROC Area PRC Area Class
0.746 0.440 0.760 0.746 0.753 0.303 0.653 0.732 verified
0.560 0.254 0.542 0.560 0.550 0.303 0.653 0.457 not_verified


Why verified class has higher F-Measure than the not_verified class?

Options: ReplyQuote
Re: why are there two different trees for the same data set
Date: February 25, 2022 01:20AM

Just by looking at these numbers, I dont know. I would recommend to look up at the definitions.

Maybe someone else can answer



Edited 1 time(s). Last edit at 02/25/2022 01:21AM by webmasterphilfv.

Options: ReplyQuote


This forum is powered by Phorum and provided by P. Fournier-Viger (© 2012).
Terms of use.