This article explains about useful functions of

This approach is used to select the final model.

It selects the least complex model within some percent tolerance of the best value. In the formula below, tol =2 means 2% loss of AUC score.

**caret package in R.**If you are new to the caret package,**check out****Part I Tutorial.****Method Functions in trainControl Parameter**

- none -
**No cross validation or Bootstrapping** - boot -
**Bootstrapping** - cv -
**Cross validation** - repeatedcv -
**Repeated Cross Validation** - oob -
**Out of Bag**(only for random forest, bagged trees, bagged earth, bagged flexible discriminant analysis, or conditional tree forest models)

The idea of cross validation or bootstrapping the training samples is to select the best parameters for a model. See the detailed explanation below under

**'How it works' section.****Example****1. Specifies a parameter grid for fine tuning GBM model**

grid <- expand.grid( .n.trees=seq(10,50,10), .interaction.depth=seq(1,4,1), .shrinkage=c(0.01,0.001), .n.minobsinnode=seq(5,20,5))

**n.trees=seq(10,50,10)**implies fine tune model by taking number of trees 10, 20, 30, 40, 50.
2. 10 fold Cross Validation

train_control <- trainControl(method = 'cv', number =10, classProbs = TRUE)

3. Train GBM Model

fit <- train(x,y,method="gbm",metric="roc", trControl=train_control, tuneGrid=grid)

**How cross validation works in caret**This approach is used to select the final model.

- In the above example there are 160 (5*4*2*4) possible parameter combinations
- For each parameter combination train performs a 10-fold cross validation
- For each parameter combination and for each fold (of the 10 folds) the performance metric (AUC) is computed (1600 AUC scores are computed)
- For each parameter combination the mean of the performance metric is computed over the 10 folds
- The parameter combination that has the best mean performance metric are considered the best parameters for the model

**How to see the best model and best tuning parameter**

1. Submit

**fit$results**to see the model results of each parameter combination
2. Submit

**fit$bestTune**to see the best tuning paramter
3. Submit

**fit$finalModel**to see the results of final model
4. Submit

**fit$resample**to see the performance over 10 folds
5. To see the individual predictions done during Cross Validation you can enable

**savePredictions = T**in trainControl, then look at**fit$pred****Selecting the Least Complex Model****Step I : Train your model**set.seed(825)

gbmFit3 <- train(Class ~ ., data = training, method = "gbm", trControl = fitControl, verbose = FALSE, tuneGrid = gbmGrid, metric = "ROC")

**Step II : Tolerance function**It selects the least complex model within some percent tolerance of the best value. In the formula below, tol =2 means 2% loss of AUC score.

whichTwoPct <- tolerance(gbmFit3$results, metric = "ROC", tol = 2, maximize = TRUE)

cat("best model within 2 pct of best:\n")

gbmFit3$results[whichTwoPct,1:6]

This comment has been removed by a blog administrator.

ReplyDeleteThis comment has been removed by a blog administrator.

ReplyDelete