This article explains two ways to score a validation dataset in PROC LOGISTIC in SAS. In simple words, scoring means using a model you have already trained to make predictions for new data.
1. SCORE Option in PROC LOGISTIC
The SCORE option in PROC LOGISTIC is used to score new observations using a fitted logistic regression model. In other words, it applies coefficients of the model to new data to calculate predicted probabilities for those new observations.
Proc Logistic Data = training; Model Sbp_flag = age_flag bmi_flag/ lackfit ctable pprob =0.5; Output out= test p=ppred; Score data=validation out = Logit_File; Run;
2. OUTMODEL / INMODEL Option in PROC LOGISTIC
In the OUTMODEL= option, you can specify the the name of the SAS data set that contains the information about the model. This data set is used to score new data. It is used as the input to the INMODEL= option.
Proc Logistic Data = training outmodel= model; Model Sbp_flag = age_flag bmi_flag/ lackfit ctable pprob =0.5; Output out= test p=ppred; Run; proc logistic inmodel=model; score data=validation out=valid; run;
Pls when is the best time to split a data set into training and validation - at the begining after forming the modeling data set or after cleaning the data (missing value imputation and outlier treatment)?
ReplyDeletePls when is the best time to split a data set into training and validation - at the begining after forming the modeling data set or after cleaning the data (missing value imputation and outlier treatment)?
ReplyDeletei split the data after cleaning the data , after missing value imputation but before outlier treatment. I do outlier treatment , during variable transformation, after initial run of proc logistic.
ReplyDeletesplit the data into training & modeling after cleaning,removing missing values and outlier, transformation. After that we run the proc logistic model.
ReplyDeletethe predicted value we get from that is that the odds ratio?
ReplyDeletemay I know where can I get your sample training data?
ReplyDelete