R : Converting Multiple Columns to Factor

In this article, we will explain how you can convert multiple columns (variables) to factor in R using both base R and dplyr packages. In R, categorical variables need to be set as factor variables. Some of the numeric variables which are categorical in nature need to be transformed to factor so that R treats them as a grouping variable.

Let's create a sample data frame called mydata having 5 variables (var1, var2, var3, var4 and var5).

# Create a dummy data frame
mydata <- data.frame(
  var1 = c("A", "B", "C"),
  var2 = c("X", "Y", "Z"),
  var3 = c(1, 2, 3),
  var4 = c(7, 8, 9),
  var5 = c("G", "H", "I")
)

How to Convert all Numeric Columns to Factor in R

In the dataframe named mydata, we have two numeric columns var3 and var4. We do not want to explicitly name these two columns. We want to convert both of them to factor.

In base R, you can convert multiple columns (variables) to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R.

In dplyr package, the across function allows you to apply a transformation across multiple columns. The mutate function from dplyr is used to modify the columns of a dataframe. In this case, where(is.numeric) selects only the numeric columns. Then, the as.factor function is applied to convert those selected columns to factors.

Base R

mydata[sapply(mydata, is.numeric)] <- lapply(mydata[sapply(mydata, is.numeric)], as.factor)
str(mydata)

dplyr

library(dplyr)

mydata <- mydata %>%
  mutate(across(where(is.numeric), as.factor))

str(mydata)
Converting Multiple Columns to Factor in R

How to convert all columns to Factor in R?

names(mydata) command returns a character vector containing the names of all the columns in the dataframe named "mydata".

Base R

col_names <- names(mydata)
mydata[,col_names] <- lapply(mydata[,col_names] , factor)
str(mydata)

dplyr

library(dplyr)

col_names <- names(mydata)
mydata <- mydata %>%
  mutate(across(all_of(col_names), as.factor))

str(mydata)

Converting Columns to Factor in R using Column Position

In this case, we are converting first, second, third and fifth variables to factor variables. mydata is a data frame.

Base R

names <- c(1:3,5)
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)

dplyr

library(dplyr)

names <- c(1:3, 5)
mydata <- mydata %>%
  mutate(across(names, as.factor))

str(mydata)

Converting Columns to Factor in R using Column Names

In this case, we are converting two columns 'var2' and 'var5' to factor variables.

Base R

names <- c('var2' ,'var5')
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)

dplyr

library(dplyr)

names <- c('var2', 'var5')
mydata <- mydata %>%
  mutate(across(names, as.factor))

str(mydata)

Convert Columns to Factor in R based on condition

Suppose you want to count unique values in a column and convert to factor only those columns having count less than 4.

Base R

col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)

dplyr

library(dplyr)

col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
mydata <- mydata %>%
  mutate(across(names(col_names)[col_names], as.factor))

str(mydata)
Related Posts
Spread the Word!
Share
About Author:
Deepanshu Bhalla

Deepanshu founded ListenData with a simple objective - Make analytics easy to understand and follow. He has over 10 years of experience in data science. During his tenure, he worked with global clients in various domains like Banking, Insurance, Private Equity, Telecom and HR.

8 Responses to "R : Converting Multiple Columns to Factor"
  1. wow.... thank you so much for this. i've been searching for this all over the internet and finally found it here...

    ReplyDelete
  2. Thank you, it was what I was looking for!

    ReplyDelete
  3. I believe that in 5. the right code for col_names is:

    col_names <- sapply(mydata,
    function(col) {length(unique(col) < 4} )

    ReplyDelete
  4. I followed your data type conversion example on my Excel ".xlsx" file. The numeric columns were converted into factors which is required by the package that I am using. However, when I run the R package, I get an error that goes like this: Error in '$<- .data.frame.'(*tmp*', "Trt", value = character(0)) replacement has 0 rows, data has 20.

    When I check the data type conversion using str() function, the numeric columns were converted to factors as I desired. However, it seems that the "myData[, names]" statement did not capture any of the data rows in the dataframe when in fact it should.

    Any helpful thoughts about my problem?

    Thank you.

    ReplyDelete
  5. This comment has been removed by the author.

    ReplyDelete
  6. Hi

    Can you please clarify that variables like exposuretime, size, concentration should be included in the generalized linear model as numeric or factors? Thanks

    ReplyDelete

Next → ← Prev
Looks like you are using an ad blocker!

To continue reading you need to turnoff adblocker and refresh the page. We rely on advertising to help fund our site. Please whitelist us if you enjoy our content.