This article explains about useful functions of caret package in R. If you are new to the caret package, check out Part I Tutorial.
How cross validation works in caret
This approach is used to select the final model.
Selecting the Least Complex Model
Step I : Train your model
It selects the least complex model within some percent tolerance of the best value. In the formula below, tol =2 means 2% loss of AUC score.
Method Functions in trainControl Parameter
- none - No cross validation or Bootstrapping
- boot - Bootstrapping
- cv - Cross validation
- repeatedcv - Repeated Cross Validation
- oob - Out of Bag (only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models)
The idea of cross validation or bootstrapping the training samples is to select the best parameters for a model. See the detailed explanation below under 'How it works' section.
Example
1. Specifies a parameter grid for fine tuning GBM model
Example
1. Specifies a parameter grid for fine tuning GBM model
grid <- expand.grid( .n.trees=seq(10,50,10), .interaction.depth=seq(1,4,1), .shrinkage=c(0.01,0.001), .n.minobsinnode=seq(5,20,5))n.trees=seq(10,50,10) implies fine tune model by taking number of trees 10, 20, 30, 40, 50.
2. 10 fold Cross Validation
train_control <- trainControl(method = 'cv', number =10, classProbs = TRUE)
3. Train GBM Model
fit <- train(x,y,method="gbm",metric="roc", trControl=train_control, tuneGrid=grid)
This approach is used to select the final model.
- In the above example there are 160 (5*4*2*4) possible parameter combinations
- For each parameter combination train performs a 10-fold cross validation
- For each parameter combination and for each fold (of the 10 folds) the performance metric (AUC) is computed (1600 AUC scores are computed)
- For each parameter combination the mean of the performance metric is computed over the 10 folds
- The parameter combination that has the best mean performance metric are considered the best parameters for the model
How to see the best model and best tuning parameter
1. Submit fit$results to see the model results of each parameter combination
2. Submit fit$bestTune to see the best tuning paramter
3. Submit fit$finalModel to see the results of final model
4. Submit fit$resample to see the performance over 10 folds
5. To see the individual predictions done during Cross Validation you can enable savePredictions = T in trainControl, then look at fit$pred
Step I : Train your model
set.seed(825)Step II : Tolerance function
gbmFit3 <- train(Class ~ ., data = training, method = "gbm", trControl = fitControl, verbose = FALSE, tuneGrid = gbmGrid, metric = "ROC")
It selects the least complex model within some percent tolerance of the best value. In the formula below, tol =2 means 2% loss of AUC score.
whichTwoPct <- tolerance(gbmFit3$results, metric = "ROC", tol = 2, maximize = TRUE)
cat("best model within 2 pct of best:\n")
gbmFit3$results[whichTwoPct,1:6]
Share Share Tweet