This function is used to generate multiply-imputed datasets using XGBoost, subsampling and predictive mean matching (PMM).
Usage
mixgb(
data,
m = 5,
maxit = 1,
ordinalAsInteger = FALSE,
pmm.type = NULL,
pmm.k = 5,
pmm.link = "prob",
initial.num = "normal",
initial.int = "mode",
initial.fac = "mode",
save.models = FALSE,
save.vars = NULL,
save.models.folder = NULL,
verbose = F,
xgb.params = list(),
nrounds = 100,
early_stopping_rounds = NULL,
print_every_n = 10L,
xgboost_verbose = 0,
...
)
Arguments
- data
A data.frame or data.table with missing values
- m
The number of imputed datasets. Default: 5
- maxit
The number of imputation iterations. Default: 1
- ordinalAsInteger
Whether to convert ordinal factors to integers. By default,
ordinalAsInteger = FALSE
. SettingordinalAsInteger = TRUE
may speed up the imputation process for large datasets.- pmm.type
The type of predictive mean matching (PMM). Possible values:
NULL
(default): Imputations without PMM;0
: Imputations with PMM type 0;1
: Imputations with PMM type 1;2
: Imputations with PMM type 2;"auto"
: Imputations with PMM type 2 for numeric/integer variables; imputations without PMM for categorical variables.
- pmm.k
The number of donors for predictive mean matching. Default: 5
- pmm.link
The link for predictive mean matching in binary variables
"prob"
(default): use probabilities;"logit"
: use logit values.
- initial.num
Initial imputation method for numeric type data:
"normal"
(default);"mean"
;"median"
;"mode"
;"sample"
.
- initial.int
Initial imputation method for integer type data:
"mode"
(default);"sample"
.
- initial.fac
Initial imputation method for factor type data:
"mode"
(default);"sample"
.
- save.models
Whether to save imputation models for imputing new data later on. Default:
FALSE
- save.vars
For the purpose of imputing new data, the imputation models for response variables specified in
save.vars
will be saved. The values insave.vars
can be a vector of names or indices. By default, only the imputation models for variables with missing values in the original data will be saved (save.vars = NULL
). To save imputation models for all variables, users can specifysave.vars = colnames(data)
.- save.models.folder
Users can specify a directory to save all imputation models. Models will be saved in JSON format by internally calling
xgb.save()
, which is recommended by XGBoost.- verbose
Verbose setting for mixgb. If
TRUE
, will print out the progress of imputation. Default:FALSE
.- xgb.params
A list of XGBoost parameters. For more details, please check XGBoost documentation on parameters.
- nrounds
The maximum number of boosting iterations for XGBoost. Default: 100
- early_stopping_rounds
An integer value
k
. XGBoost training will stop if the validation performance has not improved fork
rounds. Default: 10.- print_every_n
Print XGBoost evaluation information at every nth iteration if
xgboost_verbose > 0
.- xgboost_verbose
Verbose setting for XGBoost training: 0 (silent), 1 (print information) and 2 (print additional information). Default: 0
- ...
Extra arguments to be passed to XGBoost
Value
If save.models = FALSE
, this function will return a list of m
imputed datasets. If save.models = TRUE
, it will return an object with imputed datasets, saved models and parameters.
Examples
# obtain m multiply datasets without saving models
params <- list(max_depth = 3, subsample = 0.7, nthread = 2)
mixgb.data <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10)
# obtain m multiply imputed datasets and save models for imputing new data later on
mixgb.obj <- mixgb(data = nhanes3, m = 2, xgb.params = params, nrounds = 10,
save.models = TRUE, save.models.folder = tempdir())