xgb_train_offset() and xgb_predict_offset() are wrappers for xgboost
tree-based models where all of the model arguments are in the main function.
These functions are nearly identical to the parsnip functions
parsnip::xgb_train() and parsnip::xg_predict_offset() except that the
objective "count:poisson" is passed to xgboost::xgb.train() and an offset
term is added to the data set.
Usage
xgb_train_offset(
x,
y,
offset_col = "offset",
weights = NULL,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bynode = NULL,
colsample_bytree = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
counts = TRUE,
...
)
xgb_predict_offset(object, new_data, offset_col = "offset", ...)Arguments
- x
A data frame or matrix of predictors
- y
A vector (numeric) or matrix (numeric) of outcome data.
- offset_col
Character string. The name of a column in
datacontaining offsets.- weights
A numeric vector of weights.
- max_depth
An integer for the maximum depth of the tree.
- nrounds
An integer for the number of boosting iterations.
- eta
A numeric value between zero and one to control the learning rate.
- colsample_bynode
Subsampling proportion of columns for each node within each tree. See the
countsargument below. The default uses all columns.- colsample_bytree
Subsampling proportion of columns for each tree. See the
countsargument below. The default uses all columns.- min_child_weight
A numeric value for the minimum sum of instance weights needed in a child to continue to split.
- gamma
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree
- subsample
Subsampling proportion of rows. By default, all of the training data are used.
- validation
The proportion of the data that are used for performance assessment and potential early stopping.
- early_stop
An integer or
NULL. If notNULL, it is the number of training iterations without improvement before stopping. Ifvalidationis used, performance is base on the validation set; otherwise, the training set is used.- counts
A logical. If
FALSE,colsample_bynodeandcolsample_bytreeare both assumed to be proportions of the proportion of columns affects (instead of counts).- ...
Other options to pass to
xgb.train()or xgboost's method forpredict().- object
An
xgboostobject.- new_data
New data for predictions. Can be a data frame, matrix,
xgb.DMatrix
Examples
if (interactive()) {
us_deaths$off <- log(us_deaths$population)
x <- model.matrix(~ age_group + gender + off, us_deaths)[, -1]
mod <- xgb_train_offset(
x, us_deaths$deaths,
"off",
eta = 1,
colsample_bynode = 1,
max_depth = 2,
nrounds = 25,
counts = FALSE
)
xgb_predict_offset(mod, x, "off")
}