How to Calculate AUC (Area Under Curve) in R

In this article we will cover how to calculate AUC (Area Under Curve) in R.

What is Area Under Curve?

The Area Under Curve (AUC) is a metric used to evaluate the performance of a binary classification model. It measures the ability of a model to distinguish between events and non-events.

The AUC (ROC) curve is created by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds. The true positive rate is the proportion of events correctly classified as events, and the false positive rate is the proportion of non-events incorrectly classified as events.

The AUC ranges from 0 to 1, where:

  • AUC = 0.5: The classifier performs no better than random chance.
  • AUC > 0.5 and ≤ 1: The classifier performs better than random chance. A higher AUC value indicates better performance, with 1 representing a perfect classifier.

We need to have two R packages named ISLR and ROCR installed as prerequisites. If they are not already installed, you can install them using the command.

install.packages("ISLR")
install.packages("ROCR")

The performance() function from ROCR package is used to calculate the Area Under the Curve (AUC) as a performance metric for the model. The following R code builds logistic regression model for binary classification on the "Default" dataset from the "ISLR" package and then calculates AUC.

library(ISLR)
library(ROCR)

# Load a binary classification dataset from ISLR package
mydata <- ISLR::Default

# Set seed
set.seed(1234)

# 70% of dataset goes to training data and remaining 30% to test data
train_idx  <- sample(c(TRUE, FALSE), nrow(mydata), replace=TRUE, prob=c(0.7,0.3))
train <- mydata[train_idx, ]
test <- mydata[!train_idx, ]

# Build logistic regression model
model <- glm(default~., family="binomial", data=train)

# Calculate predicted probability of default of test data
predicted <- predict(model, test, type="response")

# Storing Model Performance Scores
pred  <- prediction(predicted, test$default)

# Calculating Area under Curve
perf <- performance(pred,"auc")
auc <- as.numeric(perf@y.values)
auc

Result: 0.9466106

How does the above code work?
  1. The "Default" dataset is loaded from the "ISLR" package.
  2. A seed is set using set.seed(1234) to ensure reproducibility. It means same output will be generated in every run.
  3. The dataset is split into a training set (70%) and a test set (30%) using random sampling.
  4. A logistic regression model is built using the glm function, where "default" is the binary dependent variable, and the rest of the variables are used as independent variables.
  5. The predict function is used to calculate the predicted probabilities of default for the test data based on the logistic regression model.
  6. The "ROCR" package is used to create a prediction object (pred) based on the predicted probabilities and the true default values from the test set.
  7. The performance of the model is evaluated by calculating the Area Under the Curve (AUC) using the performance function from "ROCR."
Plot ROC Curve in R

Let's see how we can plot the ROC curve in R. In the following code, we first calculate the ROC curve using the performance function with "tpr" (True Positive Rate or Sensitivity) and "fpr" (False Positive Rate) as arguments. Then, we use the plot function to plot the ROC curve. The abline function is used to draw the diagonal line from (0,0) to (1,1), representing the ROC curve of a random classifier.

# Plot ROC curve
roc_curve <- performance(pred, "tpr", "fpr")
plot(roc_curve, col = "blue", main = "ROC Curve", lwd = 2)
abline(0, 1, col = "gray", lty = 2, lwd = 1)
text(0.5, 0.3, paste("AUC =", round(auc, 2)), adj = c(0.5, 0.5), col = "black", cex = 1.5)
ROC Curve in R
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

0 Response to "How to Calculate AUC (Area Under Curve) in R"

Post a Comment

Next → ← Prev