Boosted Poisson Trees with Offsets via xgboost — xgb_train

xgb_train_offset() and xgb_predict_offset() are wrappers for xgboost tree-based models where all of the model arguments are in the main function. These functions are nearly identical to the parsnip functions parsnip::xgb_train() and parsnip::xg_predict_offset() except that the objective "count:poisson" is passed to xgboost::xgb.train() and an offset term is added to the data set.

Usage

xgb_train_offset(
  x,
  y,
  offset_col = "offset",
  weights = NULL,
  max_depth = 6,
  nrounds = 15,
  eta = 0.3,
  colsample_bynode = NULL,
  colsample_bytree = NULL,
  min_child_weight = 1,
  gamma = 0,
  subsample = 1,
  validation = 0,
  early_stop = NULL,
  counts = TRUE,
  ...
)

xgb_predict_offset(object, new_data, offset_col = "offset", ...)

Arguments

x: A data frame or matrix of predictors
y: A vector (numeric) or matrix (numeric) of outcome data.
offset_col: Character string. The name of a column in data containing offsets.
weights: A numeric vector of weights.
max_depth: An integer for the maximum depth of the tree.
nrounds: An integer for the number of boosting iterations.
eta: A numeric value between zero and one to control the learning rate.
colsample_bynode: Subsampling proportion of columns for each node within each tree. See the counts argument below. The default uses all columns.
colsample_bytree: Subsampling proportion of columns for each tree. See the counts argument below. The default uses all columns.
min_child_weight: A numeric value for the minimum sum of instance weights needed in a child to continue to split.
gamma: A number for the minimum loss reduction required to make a further partition on a leaf node of the tree
subsample: Subsampling proportion of rows. By default, all of the training data are used.
validation: The proportion of the data that are used for performance assessment and potential early stopping.
early_stop: An integer or NULL. If not NULL, it is the number of training iterations without improvement before stopping. If validation is used, performance is base on the validation set; otherwise, the training set is used.
counts: A logical. If FALSE, colsample_bynode and colsample_bytree are both assumed to be proportions of the proportion of columns affects (instead of counts).
...: Other options to pass to xgb.train() or xgboost's method for predict().
object: An xgboost object.
new_data: New data for predictions. Can be a data frame, matrix, xgb.DMatrix

Value

A fitted xgboost object.

Examples

us_deaths$off <- log(us_deaths$population)
x <- model.matrix(~ age_group + gender + off, us_deaths)[, -1]

mod <- xgb_train_offset(x, us_deaths$deaths, "off",
                        eta = 1, colsample_bynode = 1,
                        max_depth = 2, nrounds = 25,
                        counts = FALSE)

xgb_predict_offset(mod, x, "off")
#>   [1]  86917.36  87856.75  88831.02  90268.80  91491.18  92494.47  93641.54
#>   [8]  94210.34  94604.33  94789.28  32818.43  32719.24  32662.38  32737.09
#>  [15]  32789.09  32678.39  32982.15  33278.13  33562.50  33921.54  74569.45
#>  [22]  73790.58  72925.19  72384.32  71909.84  71223.78  70527.45  69285.33
#>  [29]  68012.13  67153.13 145755.89 147822.38 150606.58 153536.02 156549.61
#>  [36] 158776.39 160801.28 161806.72 162365.67 162106.05 161415.55 171863.98
#>  [43] 180423.08 188895.03 197082.48 204894.67 212519.39 218433.97 225662.45
#>  [50] 233487.92 270470.00 271263.03 273451.59 277127.31 281051.25 286639.03
#>  [57] 295249.28 308080.94 319192.50 328322.53 486268.75 496409.81 505989.22
#>  [64] 512867.00 520395.28 525788.94 530030.50 533801.06 535023.19 536953.31
#>  [71] 104308.39 105768.89 107269.27 108898.67 110529.01 112018.90 113959.95
#>  [78] 115047.35 115783.27 116205.40  48726.80  48606.59  48538.31  48571.93
#>  [85]  48678.86  48556.37  49077.60  49604.14  50096.76  50708.27  85581.42
#>  [92]  84756.04  83831.90  83272.05  82781.30  82032.50  81256.07  79836.87
#>  [99]  78401.77  77440.64 160666.69 162808.19 165905.03 169100.42 172538.61
#> [106] 175026.88 177291.45 178525.95 179404.23 179318.72 226630.16 242347.41
#> [113] 255214.25 267142.56 278895.47 289684.75 300198.97 308180.25 317990.84
#> [120] 328460.00 298893.75 302922.44 308948.88 316074.59 323057.00 331279.03
#> [127] 343668.12 361214.72 375329.59 387399.72 272258.25 282321.62 293497.50
#> [134] 303141.19 312546.25 319811.97 327692.84 334308.69 341610.16 347101.94