# AnalyticBridge

Subscribe to Vincent Granville's Weekly Digest:

# Detrmining the BEST decision tree.

I have developemed 3 decision trees using CHAID segmentation on the same data ( say for instance One tree with 3 nodes, one with 4 nodes and the last one with 2 nodes and one node with 2 sub-nodes) , how to select the BEST one ?

Is there any measure like KS, GINI etc that we usually use for logistic regression ?

Views: 39

### Replies to This Discussion

Friends...
Any idea would be appreciated.
Hey ,

You can try the capture rate by percentile say bad rate is 2 % what % of this bad rate gets captured within top 30 % or so (will be an approximate number) for all the three trees. This is purely from a tree efficiency point of view. In other words you can calculate the lift index as a cumulative lift index at a certain % of cases..

Sarat
Yes, 'Lift in Response Rate' may be calculated with the results of a logistic regression, just as it can be with an IDT. For example, one is looking for the cut-off that will generate a maximum response rate by targeting a minimum percentage of the population of interest.
A couple quick points:
---------
Do a quick google search on "Gains table", "Gains chart", and "lift chart" and you'll find some good info about comparing how good various models are. E.g., here is one link for you: http://www.statsoft.com/textbook/statistics-glossary/g/
---------
Remember to evaluate your model using a hold-out sample, if possible (to avoid overfitting, etc.)
---------
Also remember to consider what the practical, business, or scientific goals of the analysis are. Independent of traditional measures of model performance (which typically look at performance across the full dataset), it's also possible that models that may not be ideal for some purposes, might still reveal some important findings or insights. E.g., a tree model might not do a great job overall, but it might identify a fraction of the data (a small terminal node) that has a very high percentage of targets. Depending on your domain, this small terminal node could be valuable (e.g., everyone in that group is likely to be committing tax fraud, or are likely to have cancer, etc.)
---------
Also, just seeing which variables are important for the prediction can have value.
---------
Good luck!

1

2

3

4

5

6

7

8

9

10