Use cross-validation to find the optimal nrounds

Use cross-validation to find the optimal nrounds for an Mixgb imputer. Note that this method relies on the complete cases of a dataset to obtain the optimal nrounds.

Usage

mixgb_cv(
  data,
  nfold = 5,
  nrounds = 100,
  early_stopping_rounds = 10,
  response = NULL,
  select_features = NULL,
  xgb.params = list(),
  stringsAsFactors = FALSE,
  verbose = TRUE,
  ...
)

Arguments

data: A data.frame or a data.table with missing values.
nfold: The number of subsamples which are randomly partitioned and of equal size. Default: 5
nrounds: The max number of iterations in XGBoost training. Default: 100
early_stopping_rounds: An integer value k. Training will stop if the validation performance has not improved for k rounds.
response: The name or the column index of a response variable. Default: NULL (Randomly select an incomplete variable).
select_features: The names or the indices of selected features. Default: NULL (Select all the other variables in the dataset).
xgb.params: A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
stringsAsFactors: A logical value indicating whether all character vectors in the dataset should be converted to factors.
verbose: A logical value. Whether to print out cross-validation results during the process.
...: Extra arguments to be passed to XGBoost.

Value

A list of the optimal nrounds, evaluation.log and the chosen response.

Examples

params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
cv.results <- mixgb_cv(data = nhanes3, xgb.params = params)
#> [1]	train-rmse:30.560130+0.035170	test-rmse:30.558779+0.156891 
#> Multiple eval metrics are present. Will use test_rmse for early stopping.
#> Will train until test_rmse hasn't improved in 10 rounds.
#> 
#> [2]	train-rmse:21.491571+0.039776	test-rmse:21.477811+0.139836 
#> [3]	train-rmse:15.133326+0.022402	test-rmse:15.139801+0.143472 
#> [4]	train-rmse:10.689804+0.016780	test-rmse:10.713249+0.147504 
#> [5]	train-rmse:7.566596+0.011958	test-rmse:7.596152+0.160867 
#> [6]	train-rmse:5.403956+0.013892	test-rmse:5.439515+0.158890 
#> [7]	train-rmse:3.904181+0.006859	test-rmse:3.945961+0.158859 
#> [8]	train-rmse:2.872248+0.008051	test-rmse:2.937435+0.170109 
#> [9]	train-rmse:2.182927+0.013810	test-rmse:2.273569+0.166005 
#> [10]	train-rmse:1.733620+0.020820	test-rmse:1.868820+0.171161 
#> [11]	train-rmse:1.446156+0.028137	test-rmse:1.620598+0.142336 
#> [12]	train-rmse:1.274952+0.039978	test-rmse:1.478661+0.136210 
#> [13]	train-rmse:1.172367+0.037235	test-rmse:1.405440+0.124473 
#> [14]	train-rmse:1.111263+0.037924	test-rmse:1.361414+0.123372 
#> [15]	train-rmse:1.075399+0.037568	test-rmse:1.340916+0.122293 
#> [16]	train-rmse:1.047790+0.031334	test-rmse:1.326275+0.123997 
#> [17]	train-rmse:1.029318+0.030185	test-rmse:1.325579+0.126623 
#> [18]	train-rmse:1.017069+0.030459	test-rmse:1.318768+0.132906 
#> [19]	train-rmse:1.000146+0.029254	test-rmse:1.314193+0.139273 
#> [20]	train-rmse:0.988118+0.032847	test-rmse:1.313709+0.140720 
#> [21]	train-rmse:0.977703+0.035856	test-rmse:1.319082+0.146460 
#> [22]	train-rmse:0.964789+0.036558	test-rmse:1.328170+0.150965 
#> [23]	train-rmse:0.954627+0.037283	test-rmse:1.330554+0.156060 
#> [24]	train-rmse:0.945773+0.038953	test-rmse:1.331667+0.162484 
#> [25]	train-rmse:0.936312+0.036291	test-rmse:1.333617+0.162996 
#> [26]	train-rmse:0.926040+0.036864	test-rmse:1.340611+0.165049 
#> [27]	train-rmse:0.915942+0.034340	test-rmse:1.334514+0.164930 
#> [28]	train-rmse:0.907743+0.035015	test-rmse:1.342779+0.167557 
#> [29]	train-rmse:0.900146+0.035127	test-rmse:1.352581+0.167445 
#> [30]	train-rmse:0.892190+0.034390	test-rmse:1.353188+0.163083 
#> Stopping. Best iteration:
#> [20]	train-rmse:0.988118+0.032847	test-rmse:1.313709+0.140720
#> 
cv.results$best.nrounds
#> [1] 20

imputed.data <- mixgb(data = nhanes3, m = 3, xgb.params = params,
                      nrounds = cv.results$best.nrounds)