In R, you can convert multiple numeric variables to factor using lapply function. The lapply function is a part of apply family of functions. They perform multiple iterations (loops) in R. In R, categorical variables need to be set as factor variables. Some of the numeric variables which are categorical in nature need to be transformed to factor so that R treats them as a grouping variable.
Converting Numeric Variables to Factor
In this case, we are converting two variables 'Credit' and 'Balance' to factor variables.
Converting Numeric Variables to Factor
- Using Column Index Numbers
In this case, we are converting first, second, third and fifth numeric variables to factor variables. mydata is a data frame.
names <- c(1:3,5)2. Using Column Names
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
In this case, we are converting two variables 'Credit' and 'Balance' to factor variables.
names <- c('Credit' ,'Balance')3. Converting all variables
mydata[,names] <- lapply(mydata[,names] , factor)
str(mydata)
col_names <- names(mydata)4. Converting all numeric variables
mydata[,col_names] <- lapply(mydata[,col_names] , factor)
mydata[sapply(mydata, is.numeric)] <- lapply(mydata[sapply(mydata, is.numeric)], as.factor)5. Checking unique values in a variable and convert to factor only those variables having unique count less than 4
col_names <- sapply(mydata, function(col) length(unique(col)) < 4)
mydata[ , col_names] <- lapply(mydata[ , col_names] , factor)
wow.... thank you so much for this. i've been searching for this all over the internet and finally found it here...
ReplyDeleteeven i
Deletethese do not work
ReplyDeleteThank you, it was what I was looking for!
ReplyDeleteI believe that in 5. the right code for col_names is:
ReplyDeletecol_names <- sapply(mydata,
function(col) {length(unique(col) < 4} )
I followed your data type conversion example on my Excel ".xlsx" file. The numeric columns were converted into factors which is required by the package that I am using. However, when I run the R package, I get an error that goes like this: Error in '$<- .data.frame.'(*tmp*', "Trt", value = character(0)) replacement has 0 rows, data has 20.
ReplyDeleteWhen I check the data type conversion using str() function, the numeric columns were converted to factors as I desired. However, it seems that the "myData[, names]" statement did not capture any of the data rows in the dataframe when in fact it should.
Any helpful thoughts about my problem?
Thank you.
This comment has been removed by the author.
ReplyDeleteHi
ReplyDeleteCan you please clarify that variables like exposuretime, size, concentration should be included in the generalized linear model as numeric or factors? Thanks