In this tutorial, you will learn how to split sample into training and test data sets with R.

The following code splits 70% of the data selected randomly into training set and the remaining 30% sample into test data set.

Here

In the above program,

The following code splits 70% of the data selected randomly into training set and the remaining 30% sample into test data set.

data<-read.csv("c:/datafile.csv")

dt = sort(sample(nrow(data), nrow(data)*.7))

train<-data[dt,]

test<-data[-dt,]

Here

**sample( )**function randomly picks 70% rows from the data set. It is sampling without replacement.**Method 2 :**To maintain same percentage of event rate in both training and validation dataset.library(caret)

set.seed(3456)

trainIndex <- createDataPartition(data$FD, p = .7,

list = FALSE,

times = 1)

Train <- data[ trainIndex,]

Valid <- data[-trainIndex,]

In the above program,

**FD**is a dependent variable having two values 1 and 0.**Make sure it is defined in factor format.**
This comment has been removed by the author.

ReplyDeleteThis won't randomize the order, bad option

ReplyDeleteWhat do you mean by "This won't randomize the order"? Sample function randomize the order.

DeleteOne more way to split data into two part

ReplyDeleteinstall.packages("caTools")

library(caTools)

iris <- iris

iris$spl <-sample.split(iris,SplitRatio = 0.7)

head(iris)

train=subset(iris, iris$spl==TRUE)

test <- subset(iris,iris$spl ==FALSE)

This is the correct solution when you have a dependent column. Thank you, Rishabh.

DeleteIt helped me a lot.Thank you

ReplyDeleteVery helpful and to the point, thanks Deepanshu!

ReplyDelete