# R : Converting Multiple Columns to Factor

In this article, we will explain how you can convert multiple columns (variables) to factor in R using both base R and dplyr packages.

In R, categorical variables need to be set as factor variables. Some of the numeric variables which are categorical in nature need to be transformed to factor so that R treats them as a grouping variable.

Let's create a sample data frame called `mydata` having 5 variables.

```# Create a dummy data frame
mydata <- data.frame(
var1 = c("A", "B", "C"),
var2 = c("X", "Y", "Z"),
var3 = c(1, 2, 3),
var4 = c(7, 8, 9),
var5 = c("G", "H", "I")
)
```

## How to Convert Numeric Columns to Factor

In the dataframe named 'mydata', we have two numeric columns 'var3' and 'var4'. We do not want to explicitly name these columns while converting them to factor.

`is.numeric` function is used to identify the numeric columns. Then, the `as.factor` function is applied to convert the columns to factors.

Base R

```numeric_cols <- sapply(mydata, is.numeric)
mydata[numeric_cols] <- lapply(mydata[numeric_cols], as.factor)
str(mydata)
```

In base R, you can convert multiple columns (variables) to factor using `lapply` and `sapply` functions. The lapply and sapply functions are used to perform multiple iterations (loops) in R. The only difference between them is that lapply returns list. Whereas sapply returns vector or matrix.

dplyr

```library(dplyr)

mydata <- mydata %>%
mutate(across(where(is.numeric), as.factor))

str(mydata)
```

In dplyr package, the `across` function allows you to apply a transformation across multiple columns. The `mutate` function is used to modify the columns of a dataframe.

## How to Convert All Columns to Factor

`names(mydata)` command returns a character vector containing the names of all the columns in the dataframe named "mydata".

Base R

```col_names <- names(mydata)
mydata[,col_names] <- lapply(mydata[,col_names] , factor)
str(mydata)
```

dplyr

```library(dplyr)

col_names <- names(mydata)
mydata <- mydata %>%
mutate(across(all_of(col_names), as.factor))

str(mydata)
```

## Converting Columns to Factor using Column Position

In this case, we are converting first, second, third and fifth variables to factor variables. mydata is a data frame.

Base R

```names <- c(1:3,5)
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
```

dplyr

```library(dplyr)

names <- c(1:3, 5)
mydata <- mydata %>%
mutate(across(names, as.factor))

str(mydata)
```

## Converting Columns to Factor using Column Names

In this case, we are converting two columns 'var2' and 'var5' to factor variables.

Base R

```names <- c('var2' ,'var5')
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
```

dplyr

```library(dplyr)

names <- c('var2', 'var5')
mydata <- mydata %>%
mutate(across(names, as.factor))

str(mydata)
```

## Convert Columns to Factor Based on Condition

Suppose you want to convert only those columns to factors which have a number of unique values less than 4.

Base R

```col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)
```

dplyr

```library(dplyr)

col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
mydata <- mydata %>%
mutate(across(names(col_names)[col_names], as.factor))

str(mydata)
```
Related Posts
Share

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

Post Comment 7 Responses to "R : Converting Multiple Columns to Factor"
1. wow.... thank you so much for this. i've been searching for this all over the internet and finally found it here...

2. these do not work

3. Thank you, it was what I was looking for!

4. I believe that in 5. the right code for col_names is:

col_names <- sapply(mydata,
function(col) {length(unique(col) < 4} )

5. I followed your data type conversion example on my Excel ".xlsx" file. The numeric columns were converted into factors which is required by the package that I am using. However, when I run the R package, I get an error that goes like this: Error in '\$<- .data.frame.'(*tmp*', "Trt", value = character(0)) replacement has 0 rows, data has 20.

When I check the data type conversion using str() function, the numeric columns were converted to factors as I desired. However, it seems that the "myData[, names]" statement did not capture any of the data rows in the dataframe when in fact it should.