There are three ways to calculate optimal probability cut-off :
- Youden's J Index
- Minimize Euclidean distance of sensitivity and specificity from the point (1,1)
- Profit Maximization / Cost Minimization
Youden's J index is used to select the optimal predicted probability cut-off. It is the maximum vertical distance between ROC curve and diagonal line. The idea is to maximize the difference between True Positive and False Positive.
Youden Index Formula
J = Sensitivity - (1 - Specificity )Optimal probability cutoff is at where J is maximum.
Euclidean Distance Formula
D = Sqrt ((1-Sensitivity)^2 + (1-Specificity)^2)Optimal probability cutoff is at where D is minimum.
SAS Code
proc logistic data = test descending;
model y = x1 x2 / outroc=rocstats;
run;
data check;
set rocstats;
_SPECIF_ = (1 - _1MSPEC_);
J = _SENSIT_ + _SPECIF_ - 1;
D= Sqrt((1-_SENSIT_)**2 + (1-_SPECIF_)**2);
run;
proc sql noprint;
create table cutoff as
select _PROB_ , J
from check
having J = max(J);
run;
proc sql noprint;
create table cutoff1 as
select _PROB_ , D
from check
having D = min(D);
run;
model y = x1 x2 / outroc=rocstats;
run;
data check;
set rocstats;
_SPECIF_ = (1 - _1MSPEC_);
J = _SENSIT_ + _SPECIF_ - 1;
D= Sqrt((1-_SENSIT_)**2 + (1-_SPECIF_)**2);
run;
proc sql noprint;
create table cutoff as
select _PROB_ , J
from check
having J = max(J);
run;
proc sql noprint;
create table cutoff1 as
select _PROB_ , D
from check
having D = min(D);
run;
how to generate confusion matrix using sas code?
ReplyDelete