xgb_train_offset()
and xgb_predict_offset()
are wrappers for xgboost
tree-based models where all of the model arguments are in the main function.
These functions are nearly identical to the parsnip functions
parsnip::xgb_train()
and parsnip::xg_predict_offset()
except that the
objective "count:poisson" is passed to xgboost::xgb.train()
and an offset
term is added to the data set.
Usage
xgb_train_offset(
x,
y,
offset_col = "offset",
weights = NULL,
max_depth = 6,
nrounds = 15,
eta = 0.3,
colsample_bynode = NULL,
colsample_bytree = NULL,
min_child_weight = 1,
gamma = 0,
subsample = 1,
validation = 0,
early_stop = NULL,
counts = TRUE,
...
)
xgb_predict_offset(object, new_data, offset_col = "offset", ...)
Arguments
- x
A data frame or matrix of predictors
- y
A vector (numeric) or matrix (numeric) of outcome data.
- offset_col
Character string. The name of a column in
data
containing offsets.- weights
A numeric vector of weights.
- max_depth
An integer for the maximum depth of the tree.
- nrounds
An integer for the number of boosting iterations.
- eta
A numeric value between zero and one to control the learning rate.
- colsample_bynode
Subsampling proportion of columns for each node within each tree. See the
counts
argument below. The default uses all columns.- colsample_bytree
Subsampling proportion of columns for each tree. See the
counts
argument below. The default uses all columns.- min_child_weight
A numeric value for the minimum sum of instance weights needed in a child to continue to split.
- gamma
A number for the minimum loss reduction required to make a further partition on a leaf node of the tree
- subsample
Subsampling proportion of rows. By default, all of the training data are used.
- validation
The proportion of the data that are used for performance assessment and potential early stopping.
- early_stop
An integer or
NULL
. If notNULL
, it is the number of training iterations without improvement before stopping. Ifvalidation
is used, performance is base on the validation set; otherwise, the training set is used.- counts
A logical. If
FALSE
,colsample_bynode
andcolsample_bytree
are both assumed to be proportions of the proportion of columns affects (instead of counts).- ...
Other options to pass to
xgb.train()
or xgboost's method forpredict()
.- object
An
xgboost
object.- new_data
New data for predictions. Can be a data frame, matrix,
xgb.DMatrix
Examples
us_deaths$off <- log(us_deaths$population)
x <- model.matrix(~ age_group + gender + off, us_deaths)[, -1]
mod <- xgb_train_offset(x, us_deaths$deaths, "off",
eta = 1, colsample_bynode = 1,
max_depth = 2, nrounds = 25,
counts = FALSE)
xgb_predict_offset(mod, x, "off")
#> [1] 86917.36 87856.75 88831.02 90268.80 91491.18 92494.47 93641.54
#> [8] 94210.34 94604.33 94789.28 32818.43 32719.24 32662.38 32737.09
#> [15] 32789.09 32678.39 32982.15 33278.13 33562.50 33921.54 74569.45
#> [22] 73790.58 72925.19 72384.32 71909.84 71223.78 70527.45 69285.33
#> [29] 68012.13 67153.13 145755.89 147822.38 150606.58 153536.02 156549.61
#> [36] 158776.39 160801.28 161806.72 162365.67 162106.05 161415.55 171863.98
#> [43] 180423.08 188895.03 197082.48 204894.67 212519.39 218433.97 225662.45
#> [50] 233487.92 270470.00 271263.03 273451.59 277127.31 281051.25 286639.03
#> [57] 295249.28 308080.94 319192.50 328322.53 486268.75 496409.81 505989.22
#> [64] 512867.00 520395.28 525788.94 530030.50 533801.06 535023.19 536953.31
#> [71] 104308.39 105768.89 107269.27 108898.67 110529.01 112018.90 113959.95
#> [78] 115047.35 115783.27 116205.40 48726.80 48606.59 48538.31 48571.93
#> [85] 48678.86 48556.37 49077.60 49604.14 50096.76 50708.27 85581.42
#> [92] 84756.04 83831.90 83272.05 82781.30 82032.50 81256.07 79836.87
#> [99] 78401.77 77440.64 160666.69 162808.19 165905.03 169100.42 172538.61
#> [106] 175026.88 177291.45 178525.95 179404.23 179318.72 226630.16 242347.41
#> [113] 255214.25 267142.56 278895.47 289684.75 300198.97 308180.25 317990.84
#> [120] 328460.00 298893.75 302922.44 308948.88 316074.59 323057.00 331279.03
#> [127] 343668.12 361214.72 375329.59 387399.72 272258.25 282321.62 293497.50
#> [134] 303141.19 312546.25 319811.97 327692.84 334308.69 341610.16 347101.94