Create a summary data frame of transaction counts, amounts, and utilization rates.

## Usage

```
trx_stats(
.data,
trx_types,
percent_of = NULL,
combine_trx = FALSE,
col_exposure = "exposure",
full_exposures_only = TRUE,
conf_int = FALSE,
conf_level = 0.95
)
# S3 method for trx_df
summary(object, ...)
```

## Arguments

- .data
A data frame with exposure-level records of type

`exposed_df`

with transaction data attached. If necessary, use`as_exposed_df()`

to convert a data frame to an`exposed_df`

object, and use`add_transactions()`

to attach transactions to an`exposed_df`

object.- trx_types
A character vector of transaction types to include in the output. If none is provided, all available transaction types in

`.data`

will be used.- percent_of
A optional character vector containing column names in

`.data`

to use as denominators in the calculation of utilization rates or actual-to-expected ratios.- combine_trx
If

`FALSE`

(default), the results will contain output rows for each transaction type. If`TRUE`

, the results will contains aggregated experience across all transaction types.- col_exposure
Name of the column in

`.data`

containing exposures- full_exposures_only
If

`TRUE`

(default), partially exposed records will be excluded from`data`

.- conf_int
If

`TRUE`

, the output will include confidence intervals around the observed utilization rate and any`percent_of`

output columns.- conf_level
Confidence level for confidence intervals

- object
A

`trx_df`

object- ...
Groups to retain after

`summary()`

is called

## Value

A tibble with class `trx_df`

, `tbl_df`

, `tbl`

,
and `data.frame`

. The results include columns for any grouping
variables and transaction types, plus the following:

`trx_n`

: the number of unique transactions.`trx_amt`

: total transaction amount`trx_flag`

: the number of observation periods with non-zero transaction amounts.`exposure`

: total exposures`avg_trx`

: mean transaction amount (`trx_amt / trx_flag`

)`avg_all`

: mean transaction amount over all records (`trx_amt / exposure`

)`trx_freq`

: transaction frequency when a transaction occurs (`trx_n / trx_flag`

)`trx_utilization`

: transaction utilization per observation period (`trx_flag / exposure`

)

If `percent_of`

is provided, the results will also include:

The sum of any columns passed to

`percent_of`

with non-zero transactions. These columns include the suffix`_w_trx`

.The sum of any columns passed to

`percent_of`

`pct_of_{*}_w_trx`

: total transactions as a percentage of column`{*}_w_trx`

. In other words, total transactions divided by the sum of a column including only records utilizing transactions.`pct_of_{*}_all`

: total transactions as a percentage of column`{*}`

. In other words, total transactions divided by the sum of a column regardless of whether or not transactions were utilized.

If `conf_int`

is set to `TRUE`

, additional columns are added for lower and
upper confidence interval limits around the observed utilization rate and any
`percent_of`

output columns. Confidence interval columns include the name
of the original output column suffixed by either `_lower`

or `_upper`

.

If values are passed to

`percent_of`

, an additional column is created containing the the sum of squared transaction amounts (`trx_amt_sq`

).

## Details

Unlike `exp_stats()`

, this function requires `data`

to be an
`exposed_df`

object.

If `.data`

is grouped, the resulting data frame will contain
one row per transaction type per group.

Any number of transaction types can be passed to the `trx_types`

argument,
however each transaction type **must** appear in the `trx_types`

attribute of
`.data`

. In addition, `trx_stats()`

expects to see columns named `trx_n_{*}`

(for transaction counts) and `trx_amt_{*}`

for (transaction amounts) for each
transaction type. To ensure `.data`

is in the appropriate format, use the
functions `as_exposed_df()`

to convert an existing data frame with
transactions or `add_transactions()`

to attach transactions to an existing
`exposed_df`

object.

## "Percentage of" calculations

The `percent_of`

argument is optional. If provided, this argument must
be a character vector with values corresponding to columns in `.data`

containing values to use as denominators in the calculation of utilization
rates or actual-to-expected ratios. Example usage:

In a study of partial withdrawal transactions, if

`percent_of`

refers to account values, observed withdrawal rates can be determined.In a study of recurring claims, if

`percent_of`

refers to a column containing a maximum benefit amount, utilization rates can be determined.

## Confidence intervals

If `conf_int`

is set to `TRUE`

, the output will contain lower and upper
confidence interval limits for the observed utilization rate and any
`percent_of`

output columns. The confidence level is dictated
by `conf_level`

.

Intervals for the utilization rate (

`trx_util`

) assume a binomial distribution.Intervals for transactions as a percentage of another column with non-zero transactions (

`pct_of_{*}_w_trx`

) are constructed using a normal distributionIntervals for transactions as a percentage of another column regardless of transaction utilization (

`pct_of_{*}_all`

) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:`Var(S) = E(N) * Var(X) + E(X)^2 * Var(N)`

,Where

`S`

is the aggregate transactions random variable,`X`

is an individual transaction amount assumed to follow a normal distribution, and`N`

is a binomial random variable for transaction utilization.

## Default removal of partial exposures

As a default, partial exposures are removed from `.data`

before summarizing
results. This is done to avoid complexity associated with a lopsided skew
in the timing of transactions. For example, if transactions can occur on a
monthly basis or annually at the beginning of each policy year, partial
exposures may not be appropriate. If a policy had an exposure of 0.5 years
and was taking withdrawals annually at the beginning of the year, an
argument could be made that the exposure should instead be 1 complete year.
If the same policy was expected to take withdrawals 9 months into the year,
it's not clear if the exposure should be 0.5 years or 0.5 / 0.75 years.
To override this treatment, set `full_exposures_only`

to `FALSE`

.

`summary()`

Method

Applying `summary()`

to a `trx_df`

object will re-summarize the
data while retaining any grouping variables passed to the "dots"
(`...`

).

## Examples

```
expo <- expose_py(census_dat, "2019-12-31", target_status = "Surrender") |>
add_transactions(withdrawals)
res <- expo |> group_by(inc_guar) |> trx_stats(percent_of = "premium")
res
#> Transaction study results
#>
#> Groups: inc_guar
#> Study range: 1900-01-01 to 2019-12-31
#> Transaction types: Base, Rider
#> Transactions as % of: premium
#> # A tibble: 4 × 14
#> inc_guar trx_type trx_n trx_flag trx_amt exposure avg_trx avg_all trx_freq
#> <lgl> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FALSE Base 52939 24703 952629 48938 38.6 19.5 2.14
#> 2 FALSE Rider 0 0 0 48938 NaN 0 NaN
#> 3 TRUE Base 7561 3521 141270 75235 40.1 1.88 2.15
#> 4 TRUE Rider 77321 35941 2842729 75235 79.1 37.8 2.15
#> # ℹ 5 more variables: trx_util <dbl>, premium_w_trx <dbl>, premium <dbl>,
#> # pct_of_premium_w_trx <dbl>, pct_of_premium_all <dbl>
summary(res)
#> Transaction study results
#>
#> Study range: 1900-01-01 to 2019-12-31
#> Transaction types: Base, Rider
#> Transactions as % of: premium
#> # A tibble: 2 × 13
#> trx_type trx_n trx_flag trx_amt exposure avg_trx avg_all trx_freq trx_util
#> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 Base 60500 28224 1093899 124173 38.8 8.81 2.14 0.227
#> 2 Rider 77321 35941 2842729 124173 79.1 22.9 2.15 0.289
#> # ℹ 4 more variables: premium_w_trx <dbl>, premium <dbl>,
#> # pct_of_premium_w_trx <dbl>, pct_of_premium_all <dbl>
expo |> group_by(inc_guar) |>
trx_stats(percent_of = "premium", combine_trx = TRUE, conf_int = TRUE)
#> Transaction study results
#>
#> Groups: inc_guar
#> Study range: 1900-01-01 to 2019-12-31
#> Transaction types: Base, Rider
#> Transactions as % of: premium
#> # A tibble: 2 × 21
#> inc_guar trx_type trx_n trx_flag trx_amt exposure avg_trx avg_all trx_freq
#> <lgl> <chr> <dbl> <int> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 FALSE All 52939 24703 952629 48938 38.6 19.5 2.14
#> 2 TRUE All 84882 39462 2983999 75235 75.6 39.7 2.15
#> # ℹ 12 more variables: trx_util <dbl>, premium_w_trx <dbl>, premium <dbl>,
#> # pct_of_premium_w_trx <dbl>, pct_of_premium_all <dbl>, trx_util_lower <dbl>,
#> # trx_util_upper <dbl>, pct_of_premium_w_trx_lower <dbl>,
#> # pct_of_premium_w_trx_upper <dbl>, pct_of_premium_all_lower <dbl>,
#> # pct_of_premium_all_upper <dbl>, trx_amt_sq <dbl>
```