This tutorial explains various ways to create a ROC or AUC Curve in SAS.
ROC curve measures how well a model can differentiate between events and non-events across different classification thresholds. It plots true positive rate (Sensitivity) against false positive rate (1-Specificity) for a binary predictive model.
Let's create a sample SAS dataset for demonstration purpose. In this dataset, dependent variable is "attrition" and independent variables are years of experience and annual salary in dollars.
data mydata; input attrition 1-2 yoe 3-4 salary; label attrition='Employee Attrition'; datalines; 0 9 98217 1 5 53477 1 2 22447 1 2 21458 0 2 25990 0 10 106338 1 6 67279 0 4 46575 1 8 83782 0 7 76975 0 4 48110 1 7 74134 1 8 87071 1 9 94795 0 1 16762 0 7 74261 0 8 88497 1 9 92901 0 7 76878 0 9 94273 1 8 87021 ; run;
Method 1 : Creating ROC Curve using PLOTS option
The following code uses the PROC LOGISTIC procedure with the "descending" option to tell SAS that 1 is event (attritors). The plots(only)=roc option is used to create the ROC curve for the model.
proc logistic data=mydata descending plots(only)=roc; model attrition = yoe salary; run;
If AUC is closer to 1, it means it's a good model. If AUC is equal to 0.5, it means random guessing and model is of no use. Please make sure to validate model on a dataset other than training to conclude about the performance of the model.
Method 2 : Creating ROC Curve using PROC GPLOT
In this method, we are storing Sensitivity and (1-Specificity) scores in a new dataset using the OUTROC option in PROC LOGISTIC. Then we are plotting them using PROC GPLOT procedure.
proc logistic data=mydata desc; model attrition = yoe salary /outroc = rocdata ; run; proc gplot data=rocdata; symbol1 i=join v = none c=red line=1; plot _sensit_ * _1mspec_; run; quit;
Method 3 : Creating ROC Curve for Any Model
Suppose you have predictive probabilities of a decision tree or random forest model in SAS. You want to create a ROC Curve in SAS. It's a universal approach, not just limited to Logistic Regression model.
data pred; input prob; datalines; 0.73 0.68 0.67 0.31 0.1 0 0.28 0.78 0.45 0.95 0.03 0.75 0.66 0.69 0.31 0.91 0.35 0.93 0.56 0.02 0.55 ; run;
First step is to create a dataset which contains both dependent variable and predicted probabilities using MERGE statement. Next step is to use the "nofit" option in the PROC LOGISTIC procedure. It tells SAS that we don't want to build a logistic regression model. Instead we want to create a ROC Curve using the predicted probabilities in the dataset.
data finalData; merge mydata pred; run; proc logistic data=finalData; model attrition(event='1') = prob / nofit; roc pred=prob; ods select ROCcurve; run;
Share Share Tweet