The following code builds a logistic regression model for binary classification and evaluate its performance using the AUC metric on the training data. We are using the ROCR package to calculate the Area Under Curve (AUC) for the model.
library(ISLR) library(ROCR) # Load a binary classification dataset from ISLR package mydata <- ISLR::Default # Set seed for reproducibility set.seed(1234) # 70% of dataset goes to training data and remaining 30% to test data train_idx <- sample(c(TRUE, FALSE), nrow(mydata), replace=TRUE, prob=c(0.7,0.3)) train <- mydata[train_idx, ] test <- mydata[!train_idx, ] # Build logistic regression model model <- glm(default~., family="binomial", data=train) # Calculate predicted probability of default predicted <- predict(model, type="response") # Storing Model Performance Scores pred <- prediction(predicted, train$default) # Calculating Area under Curve perf <- performance(pred,"auc") auc <- as.numeric(perf@y.values) auc
Result : auc = 0.9522256
We often make an error when calculating the AUC of a training dataset in R. We set up our training dataset in the predict function the same way we define our test data in the function. This is where the mistake occurs.
Incorrect Syntaxpredicted <- predict(model, train, type="response")
It is incorrect because we are telling R to consider our training dataset as a new dataset and predict it.
Correct Syntaxpredicted <- predict(model, type="response")
Steps to Calculate AUC of Training Dataset
- Splits the dataset into training and test sets, with 70% of the data going to the training set and the remaining 30% to the test set.
- Builds a logistic regression model using the training data with the response variable "default" and all other variables as predictors.
- Calculates the predicted probabilities of default for the training data using the logistic regression model.
- Stores the model performance scores by creating a prediction object using the predicted probabilities and the true binary labels from the training data.
- Calculates the Area Under Curve (AUC) for the model using the ROCR package.
- Use the performance() function from the ROCR package and set "auc" as a performance measure for the evaluation.
Share Share Tweet