AnalyticBridge

Subscribe to Vincent Granville's Weekly Digest:

Logistics regression-ROC curve

Hi All,

we are already value of area under curve (c) on logistics regression on SAS then why there is need to plot ROC curce and is there is no need to plot ROC curve...?

if we plot ROC for some cut off value then how do we select the cutoff value if we give command on sas

/ ctable pprob=0.1 to 1

And also how we are going to proceed with that cutoff value for further analysis....

it will be great help to me..

Views: 1839

Comment

Join AnalyticBridge

Comment by Sandeep Kumar Srivastava on June 20, 2012 at 3:36am

Thanks Mike,

Ya i know the concepts of ROC curve, but my doubt was only to find the appropriate cut-off when we have 10 different pr cut-off (output of the ctable pprob=0.1 to 1 by 0.1)....which i got by Kevin explanations..

Comment by Mike O'Neil on June 19, 2012 at 5:20am

Sandeep, you need to understand that the ROC curve is a plot where the points on the plot are calculated from the counts in the confusion matrix for a given model score cut-off.  If you take the output of the ctable pprob=0.1 to 1 by 0.1 then you have the counts of TN TP FN FP that allow you to calculate the  x and y coordinates on the roc curve for 10 different pr cut-offs.  What you need to understand is what is the cost matrix associated with TN TP FN FP so that you can make decisions about where is the optimal cut-off for your particular problem. From your question, it looks like you need to do some more study to understand what a roc curve represents, and how to use a risk score generated by a logistic regression. Actually a risk score generated by a model (which does not actually have to be a statistical model).  Google "Nuts and Bolts of Data Mining: Classifiers & ROC Curves" By Tim Graettinger which is quite a good article that helps understand these concepts.

Comment by Sandeep Kumar Srivastava on June 18, 2012 at 11:34pm

Thanks a lot....Kevin

Appreciated....these concepts are really very helpful to me..

Comment by Kevin Pedde on June 18, 2012 at 8:12am

This is what I have done in the past to get the Youden Index from proc logistic in SAS.  After your model statement use this code:

(your code may look slightly different.  outroc outputs the specificity and sensitivity)

model dep_var(event='1') = in_var1 in_var2 in_var3 / outroc=rocstats;

Than I have a data step that calculates J:

data Youden;

set rocstats;

_SPECIF_ = (1 - _1MSPEC_);

J = _SENSIT_ + _SPECIF_ - 1;

run;

Then get max J value:

proc means data = Youden max;

var J;

run;

I then use a proc print statement to output the value.  Now that you have the cutoff, every score above this value can be classified as a "success".  I have used this in the past but I don't use Youden Index anymore, I always end up targeting people in the top 1 - 3 deciles, or depending on what I am modeling I will target deciles 3 - 7.  It really depends on what you are modeling.

Comment by Sandeep Kumar Srivastava on June 17, 2012 at 10:56am

First of thanks to Jozo and Kevin for responding me....it really helped me a lot...

Now....could you please explain me how to choose a right cut-off..

It is like the the value where sensitivity is equal to specificity (Goods=Bads)?

Comment by Kevin Pedde on June 17, 2012 at 7:36am

I usually go straight to putting the scores from regression into deciles and see if I get an even distribution in each decile.  I also look to see how many "successes" I get in each decile and how those are distributed from the top decile to bottom decile.  From there I will run several models to see which gives me the highest percentage of "successes" in the top deciles, while keeping an even distribution of scores and not over fitting the model.

Like Jozo said, there is statistical method of choosing a cutoff (Youden Index) and a business cutoff.

Comment by Jozo Kovac on June 15, 2012 at 8:57pm

Shape is important. You care about how "bads" are distributed. It helps you to:

- Verify if you have enough degrees of freedom

- Expain how good your model catches "bads" on both ends of curve

- Choose right cut-off

- Understand if you have  caught all "bads" in first 10%, 50% or there are also some in the last decile? That's difference!

There are two cut-offs:

- One of them is statistical - separates goods and bads in then optimal way.

- Another one is business - is man made decision how to apply your model in further applications. Who's accepted who declined. Who targeted by marketing who let in peace. It's art to set this one right. Or hard work beyond data-mining.

Which one are you looking for?

Comment by Kevin Pedde on June 15, 2012 at 12:22pm

I generally don't find a need to plot the ROC curve, I just care about the c-statistic or area under the curve.  Are you asking how you find the Youden Index (cutoff value) from logistic regression in SAS?

1

2

3

4

5

6

7

8

9

10