Create a summary data frame of termination experience for a given target status.
Usage
exp_stats(
.data,
target_status = attr(.data, "target_status"),
expected,
col_exposure = "exposure",
col_status = "status",
wt = NULL,
credibility = FALSE,
conf_level = 0.95,
cred_r = 0.05,
conf_int = FALSE,
control_vars,
control_distinct_max = 25L
)
# S3 method for class 'exp_df'
summary(object, ...)
Arguments
- .data
A data frame with exposure-level records, ideally of type
exposed_df
- target_status
A character vector of target status values
- expected
A character vector containing column names in
.data
with expected values- col_exposure
Name of the column in
.data
containing exposures- col_status
Name of the column in
.data
containing the policy status- wt
Optional. Length 1 character vector. Name of the column in
.data
containing weights to use in the calculation of claims, exposures, partial credibility, and confidence intervals.- credibility
If
TRUE
, the output will include partial credibility weights and credibility-weighted termination rates.- conf_level
Confidence level used for the Limited Fluctuation credibility method and confidence intervals
- cred_r
Error tolerance under the Limited Fluctuation credibility method
- conf_int
If
TRUE
, the output will include confidence intervals around the observed termination rates and any actual-to-expected ratios.- control_vars
".none"
or a character vector containing column names in.data
to use as control variables- control_distinct_max
Maximum number of unique values allowed for control variables
- object
An
exp_df
object- ...
Groups to retain after
summary()
is called
Value
A tibble with class exp_df
, tbl_df
, tbl
,
and data.frame
. The results include columns for any grouping variables,
claims, exposures, and observed termination rates (q_obs
).
If any values are passed to
expected
orcontrol_vars
, additional columns are added for expected termination rates and actual-to-expected (A/E) ratios. A/E ratios are prefixed byae_
.If
credibility
is set toTRUE
, additional columns are added for partial credibility and credibility-weighted termination rates (assuming values are passed toexpected
). Credibility-weighted termination rates are prefixed byadj_
.If
conf_int
is set toTRUE
, additional columns are added for lower and upper confidence interval limits around the observed termination rates and any actual-to-expected ratios. Additionally, ifcredibility
isTRUE
and expected values are passed toexpected
, the output will contain confidence intervals around credibility-weighted termination rates. Confidence interval columns include the name of the original output column suffixed by either_lower
or_upper
.If a value is passed to
wt
, additional columns are created containing the the sum of weights (.weight
), the sum of squared weights (.weight_qs
), and the number of records (.weight_n
).
Details
If .data
is grouped, the resulting data frame will contain
one row per group.
If target_status
isn't provided, exp_stats()
will use the same
target status from .data
if it has the class exposed_df
.
Otherwise, all status values except the first level will be assumed.
This will produce a warning message.
Expected values
The expected
argument is optional. If provided, this argument must
be a character vector with values corresponding to column names in .data
containing expected experience. More than one expected basis can be provided.
Control variables
The control_vars
argument is optional. If provided, this argument must
be ".none"
(more on this below) or a character vector with values
corresponding to column names in .data
. Control variables are used to
estimate the impact of any grouping variables on observed experience
after accounting for the impact of control variables.
Mechanically, when values are passed to control_vars
, a separate call
is made to exp_stats()
using the control variables as grouping variables.
This is used to derive a new expected values basis called control
, which is
both added to .data
and appended to the expected
argument. In the final
output, a column called ae_control
shows the relative impact of any
grouping variables after accounting for the control variables.
About ".none"
: If ".none"
is passed to control_vars
, a single
aggregate termination rate is calculated for the entire data set and used to
compute control
and ae_control
.
The control_distinct_max
argument places an upper limit on the number of
unique values that a control variable is allowed to have. This limit exists
to prevent an excessive number of groups on continuous or high-cardinality
features.
It should be noted that usage of control variables is a rough approximation and not a substitute for rigorous statistical models. The impact of control variables is calculated in isolation and does consider other features or possible confounding variables. As such, control variables are most useful for exploratory data analysis.
Credibility
If credibility
is set to TRUE
, the output will contain a
credibility
column equal to the partial credibility estimate under
the Limited Fluctuation credibility method (also known as Classical
Credibility) assuming a binomial distribution of claims.
Confidence intervals
If conf_int
is set to TRUE
, the output will contain lower and upper
confidence interval limits for the observed termination rate and any
actual-to-expected ratios. The confidence level is dictated
by conf_level
. If no weighting variable is passed to wt
, confidence
intervals will be constructed assuming a binomial distribution of claims.
Otherwise, confidence intervals will be calculated assuming that the
aggregate claims distribution is normal with a mean equal to observed claims
and a variance equal to:
Var(S) = E(N) * Var(X) + E(X)^2 * Var(N)
,
Where S
is the aggregate claim random variable, X
is the weighting
variable assumed to follow a normal distribution, and N
is a binomial
random variable for the number of claims.
If credibility
is TRUE
and expected values are passed to expected
,
the output will also contain confidence intervals for any
credibility-weighted termination rates.
summary()
Method
Applying summary()
to a exp_df
object will re-summarize the
data while retaining any grouping variables passed to the "dots"
(...
).
Examples
toy_census |> expose("2022-12-31", target_status = "Surrender") |>
exp_stats()
#>
#> ── Experience study results ──
#>
#> • Target status: Surrender
#> • Study range: 1900-01-01 to 2022-12-31
#>
#> # A tibble: 1 × 4
#> n_claims claims exposure q_obs
#> <int> <int> <dbl> <dbl>
#> 1 1 1 35.3 0.0283
exp_res <- census_dat |>
expose("2019-12-31", target_status = "Surrender") |>
group_by(pol_yr, inc_guar) |>
exp_stats(control_vars = "product")
exp_res
#>
#> ── Experience study results ──
#>
#> • Groups: pol_yr and inc_guar
#> • Target status: Surrender
#> • Study range: 1900-01-01 to 2019-12-31
#> • Control variables: product
#> • Expected values: control
#>
#> # A tibble: 30 × 8
#> pol_yr inc_guar n_claims claims exposure q_obs control ae_control
#> <int> <lgl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 1 FALSE 56 56 7720. 0.00725 0.0217 0.335
#> 2 1 TRUE 46 46 11532. 0.00399 0.0217 0.184
#> 3 2 FALSE 92 92 7103. 0.0130 0.0216 0.598
#> 4 2 TRUE 68 68 10612. 0.00641 0.0216 0.296
#> 5 3 FALSE 67 67 6447. 0.0104 0.0216 0.480
#> 6 3 TRUE 57 57 9650. 0.00591 0.0216 0.273
#> 7 4 FALSE 123 123 5799. 0.0212 0.0216 0.980
#> 8 4 TRUE 45 45 8737. 0.00515 0.0216 0.238
#> 9 5 FALSE 97 97 5106. 0.0190 0.0216 0.878
#> 10 5 TRUE 67 67 7810. 0.00858 0.0216 0.396
#> # ℹ 20 more rows
summary(exp_res)
#>
#> ── Experience study results ──
#>
#> • Target status: Surrender
#> • Study range: 1900-01-01 to 2019-12-31
#> • Control variables: product
#> • Expected values: control
#>
#> # A tibble: 1 × 6
#> n_claims claims exposure q_obs control ae_control
#> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 2869 2869 132634. 0.0216 0.0216 1
summary(exp_res, inc_guar)
#>
#> ── Experience study results ──
#>
#> • Groups: inc_guar
#> • Target status: Surrender
#> • Study range: 1900-01-01 to 2019-12-31
#> • Control variables: product
#> • Expected values: control
#>
#> # A tibble: 2 × 7
#> inc_guar n_claims claims exposure q_obs control ae_control
#> <lgl> <int> <int> <dbl> <dbl> <dbl> <dbl>
#> 1 FALSE 1601 1601 52123. 0.0307 0.0216 1.42
#> 2 TRUE 1268 1268 80511. 0.0157 0.0216 0.728