# How to Calculate AUC (Area Under Curve) in R

In this article we will cover how to calculate AUC (Area Under Curve) in R.

## What is Area Under Curve?

The Area Under Curve (AUC) is a metric used to evaluate the performance of a binary classification model. It measures the ability of a model to distinguish between events and non-events.

The AUC (ROC) curve is created by plotting the true positive rate (sensitivity) against the false positive rate (1-specificity) at various classification thresholds. The true positive rate is the proportion of events correctly classified as events, and the false positive rate is the proportion of non-events incorrectly classified as events.

The AUC ranges from 0 to 1, where:

• AUC = 0.5: The classifier performs no better than random chance.
• AUC > 0.5 and ≤ 1: The classifier performs better than random chance. A higher AUC value indicates better performance, with 1 representing a perfect classifier.

We need to have two R packages named `ISLR` and `ROCR` installed as prerequisites. If they are not already installed, you can install them using the command.

```install.packages("ISLR")
install.packages("ROCR")
```

The `performance()` function from ROCR package is used to calculate the Area Under the Curve (AUC) as a performance metric for the model. The following R code builds logistic regression model for binary classification on the "Default" dataset from the "ISLR" package and then calculates AUC.

```library(ISLR)
library(ROCR)

# Load a binary classification dataset from ISLR package
mydata <- ISLR::Default

# Set seed
set.seed(1234)

# 70% of dataset goes to training data and remaining 30% to test data
train_idx  <- sample(c(TRUE, FALSE), nrow(mydata), replace=TRUE, prob=c(0.7,0.3))
train <- mydata[train_idx, ]
test <- mydata[!train_idx, ]

# Build logistic regression model
model <- glm(default~., family="binomial", data=train)

# Calculate predicted probability of default of test data
predicted <- predict(model, test, type="response")

# Storing Model Performance Scores
pred  <- prediction(predicted, test\$default)

# Calculating Area under Curve
perf <- performance(pred,"auc")
auc <- as.numeric(perf@y.values)
auc
```

Result: 0.9466106

How does the above code work?
1. The "Default" dataset is loaded from the "ISLR" package.
2. A seed is set using `set.seed(1234)` to ensure reproducibility. It means same output will be generated in every run.
3. The dataset is split into a training set (70%) and a test set (30%) using random sampling.
4. A logistic regression model is built using the `glm` function, where "default" is the binary dependent variable, and the rest of the variables are used as independent variables.
5. The `predict` function is used to calculate the predicted probabilities of default for the test data based on the logistic regression model.
6. The "ROCR" package is used to create a prediction object (pred) based on the predicted probabilities and the true default values from the test set.
7. The performance of the model is evaluated by calculating the Area Under the Curve (AUC) using the `performance` function from "ROCR."
Plot ROC Curve in R

Let's see how we can plot the ROC curve in R. In the following code, we first calculate the ROC curve using the performance function with "tpr" (True Positive Rate or Sensitivity) and "fpr" (False Positive Rate) as arguments. Then, we use the plot function to plot the ROC curve. The abline function is used to draw the diagonal line from (0,0) to (1,1), representing the ROC curve of a random classifier.

```# Plot ROC curve
roc_curve <- performance(pred, "tpr", "fpr")
plot(roc_curve, col = "blue", main = "ROC Curve", lwd = 2)
abline(0, 1, col = "gray", lty = 2, lwd = 1)
text(0.5, 0.3, paste("AUC =", round(auc, 2)), adj = c(0.5, 0.5), col = "black", cex = 1.5)
```
Related Posts
Share 