Use cross-validation to find the optimal nrounds
for an Mixgb
imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds
.
Usage
mixgb_cv(
data,
nfold = 5,
nrounds = 100,
early_stopping_rounds = 10,
response = NULL,
select_features = NULL,
xgb.params = list(),
stringsAsFactors = FALSE,
verbose = TRUE,
...
)
Arguments
- data
A data.frame or a data.table with missing values.
- nfold
The number of subsamples which are randomly partitioned and of equal size. Default: 5
- nrounds
The max number of iterations in XGBoost training. Default: 100
- early_stopping_rounds
An integer value
k
. Training will stop if the validation performance has not improved fork
rounds.- response
The name or the column index of a response variable. Default:
NULL
(Randomly select an incomplete variable).- select_features
The names or the indices of selected features. Default:
NULL
(Select all the other variables in the dataset).- xgb.params
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
- stringsAsFactors
A logical value indicating whether all character vectors in the dataset should be converted to factors.
- verbose
A logical value. Whether to print out cross-validation results during the process.
- ...
Extra arguments to be passed to XGBoost.
Examples
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
#> [1] train-rmse:30.560130+0.035170 test-rmse:30.558779+0.156891
#> Multiple eval metrics are present. Will use test_rmse for early stopping.
#> Will train until test_rmse hasn't improved in 10 rounds.
#>
#> [2] train-rmse:21.491571+0.039776 test-rmse:21.477811+0.139836
#> [3] train-rmse:15.133326+0.022402 test-rmse:15.139801+0.143472
#> [4] train-rmse:10.689804+0.016780 test-rmse:10.713249+0.147504
#> [5] train-rmse:7.566596+0.011958 test-rmse:7.596152+0.160867
#> [6] train-rmse:5.403956+0.013892 test-rmse:5.439515+0.158890
#> [7] train-rmse:3.904181+0.006859 test-rmse:3.945961+0.158859
#> [8] train-rmse:2.872248+0.008051 test-rmse:2.937435+0.170109
#> [9] train-rmse:2.182927+0.013810 test-rmse:2.273569+0.166005
#> [10] train-rmse:1.733620+0.020820 test-rmse:1.868820+0.171161
#> [11] train-rmse:1.446156+0.028137 test-rmse:1.620598+0.142336
#> [12] train-rmse:1.274952+0.039978 test-rmse:1.478661+0.136210
#> [13] train-rmse:1.172367+0.037235 test-rmse:1.405440+0.124473
#> [14] train-rmse:1.111263+0.037924 test-rmse:1.361414+0.123372
#> [15] train-rmse:1.075399+0.037568 test-rmse:1.340916+0.122293
#> [16] train-rmse:1.047790+0.031334 test-rmse:1.326275+0.123997
#> [17] train-rmse:1.029318+0.030185 test-rmse:1.325579+0.126623
#> [18] train-rmse:1.017069+0.030459 test-rmse:1.318768+0.132906
#> [19] train-rmse:1.000146+0.029254 test-rmse:1.314193+0.139273
#> [20] train-rmse:0.988118+0.032847 test-rmse:1.313709+0.140720
#> [21] train-rmse:0.977703+0.035856 test-rmse:1.319082+0.146460
#> [22] train-rmse:0.964789+0.036558 test-rmse:1.328170+0.150965
#> [23] train-rmse:0.954627+0.037283 test-rmse:1.330554+0.156060
#> [24] train-rmse:0.945773+0.038953 test-rmse:1.331667+0.162484
#> [25] train-rmse:0.936312+0.036291 test-rmse:1.333617+0.162996
#> [26] train-rmse:0.926040+0.036864 test-rmse:1.340611+0.165049
#> [27] train-rmse:0.915942+0.034340 test-rmse:1.334514+0.164930
#> [28] train-rmse:0.907743+0.035015 test-rmse:1.342779+0.167557
#> [29] train-rmse:0.900146+0.035127 test-rmse:1.352581+0.167445
#> [30] train-rmse:0.892190+0.034390 test-rmse:1.353188+0.163083
#> Stopping. Best iteration:
#> [20] train-rmse:0.988118+0.032847 test-rmse:1.313709+0.140720
#>
cv.results$best.nrounds
#> [1] 20
imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params,
nrounds = cv.results$best.nrounds)