In this article, we will explain how you can convert multiple columns (variables) to factor in R using both base R and dplyr packages. In R, categorical variables need to be set as factor variables. Some of the numeric variables which are categorical in nature need to be transformed to factor so that R treats them as a grouping variable.

Let's create a sample data frame called `mydata`

having 5 variables (var1, var2, var3, var4 and var5).

# Create a dummy data frame mydata <- data.frame( var1 = c("A", "B", "C"), var2 = c("X", "Y", "Z"), var3 = c(1, 2, 3), var4 = c(7, 8, 9), var5 = c("G", "H", "I") )

## How to Convert all Numeric Columns to Factor in R

In the dataframe named mydata, we have two numeric columns var3 and var4. We do not want to explicitly name these two columns. We want to convert both of them to factor.

In **base R**, you can convert multiple columns (variables) to factor using `lapply`

function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.

In **dplyr** package, the `across`

function allows you to apply a transformation across multiple columns. The `mutate`

function from dplyr is used to modify the columns of a dataframe. In this case, `where(is.numeric)`

selects only the numeric columns. Then, the `as.factor`

function is applied to convert those selected columns to factors.

Base R

mydata[sapply(mydata, is.numeric)] <- lapply(mydata[sapply(mydata, is.numeric)], as.factor) str(mydata)

dplyr

library(dplyr) mydata <- mydata %>% mutate(across(where(is.numeric), as.factor)) str(mydata)

## How to convert all columns to Factor in R?

`names(mydata)`

command returns a character vector containing the names of all the columns in the dataframe named "mydata".

Base R

col_names <- names(mydata) mydata[,col_names] <- lapply(mydata[,col_names] , factor) str(mydata)

dplyr

library(dplyr) col_names <- names(mydata) mydata <- mydata %>% mutate(across(all_of(col_names), as.factor)) str(mydata)

## Converting Columns to Factor in R using Column Position

In this case, we are converting first, second, third and fifth variables to factor variables. **mydata **is a data frame.

Base R

names <- c(1:3,5) mydata[,names] <- lapply(mydata[,names] , factor) str(mydata)

dplyr

library(dplyr) names <- c(1:3, 5) mydata <- mydata %>% mutate(across(names, as.factor)) str(mydata)

## Converting Columns to Factor in R using Column Names

In this case, we are converting two columns 'var2' and 'var5' to factor variables.

Base R

names <- c('var2' ,'var5') mydata[,names] <- lapply(mydata[,names] , factor) str(mydata)

dplyr

library(dplyr) names <- c('var2', 'var5') mydata <- mydata %>% mutate(across(names, as.factor)) str(mydata)

## Convert Columns to Factor in R based on condition

Suppose you want to count unique values in a column and convert to factor only those columns having count less than 4.

Base R

col_names <- sapply(mydata, function(col) length(unique(col)) < 4) mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)

dplyr

library(dplyr) col_names <- sapply(mydata, function(col) length(unique(col)) < 4) mydata <- mydata %>% mutate(across(names(col_names)[col_names], as.factor)) str(mydata)

