TrxStats

trx_stats.TrxStats(self, expo, trx_types=None, percent_of=None, combine_trx=False, full_exposures_only=True, conf_int=False, conf_level=0.95, col_exposure='exposure')

Transactions study summary class

Create a summary of transaction counts, amounts, and utilization rates (a TrxStats object).

Typically, the TrxStats class constructor should not be called directly. The preferred method for creating a TrxStats object is to call the trx_stats() method on an ExposedDF object.

Parameters

Name Type Description Default
expo actxps.expose.ExposedDF An exposed data frame class required
trx_types list or str A list of transaction types to include in the output. If None is provided, all available transaction types in the trx_types property of expo will be used. None
percent_of list | str A optional list containing column names in the data property of expo to use as denominators in the calculation of utilization rates or actual-to-expected ratios. None
combine_trx bool If False, the results will contain output rows for each transaction type. If True, the results will contain aggregated experience across all transaction types. False
full_exposures_only bool If True, partially exposed records will be ignored in the results. True
conf_int bool If True, the output will include confidence intervals around the observed utilization rate and any percent_of output columns. False
conf_level float Confidence level for confidence intervals 0.95
col_exposure str Name of the column in the data property of expo containing exposures 'exposure'

Attributes

Name Type Description
data polars.polars.DataFrame A data framethat includes columns for any grouping variables and transaction types, plus the following: trx_n (the number of unique transactions), trx_amt (total transaction amount), trx_flag (the number of observation periods with non-zero transaction amounts), exposure (total exposures), avg_trx (mean transaction amount {trx_amt / trx_flag}), avg_all (mean transaction amount over all records {trx_amt / exposure}), trx_freq (transaction frequency when a transaction occurs {trx_n / trx_flag}), trx_utilization (transaction utilization per observation period {trx_flag / exposure}). If percent_of is provided, the results will also include the sum of any columns passed to percent_of with non-zero transactions (these columns include the suffix _w_trx. - The sum of any columns passed to percent_of), pct_of_{*}_w_trx (total transactions as a percentage of column {*}_w_trx), pct_of_{*}_all (total transactions as a percentage of column {*}).

Notes

If the ExposedDF object is grouped (see the group_by() method), the returned TrxStats object’s data will contain one row per group.

Any number of transaction types can be passed to the trx_types argument, however each transaction type must appear in the trx_types property of the ExposedDF object. In addition, trx_stats() expects to see columns named trx_n_{*} (for transaction counts) and trx_amt_{*} for (transaction amounts) for each transaction type. To ensure .data is in the appropriate format, use the class method ExposedDF.from_DataFrame() to convert an existing data frame with transactions or use add_transactions() to attach transactions to an existing ExposedDF object.

“Percentage of” calculations

The percent_of argument is optional. If provided, this argument must be list with values corresponding to columns in the data property of expo containing values to use as denominators in the calculation of utilization rates or actual-to-expected ratios. Example usage:

  • In a study of partial withdrawal transactions, if percent_of refers to account values, observed withdrawal rates can be determined.
  • In a study of recurring claims, if percent_of refers to a column containing a maximum benefit amount, utilization rates can be determined.

Confidence intervals

If conf_int is set to True, the output will contain lower and upper confidence interval limits for the observed utilization rate and any percent_of output columns. The confidence level is dictated by conf_level.

  • Intervals for the utilization rate (trx_util) assume a binomial distribution.

  • Intervals for transactions as a percentage of another column with non-zero transactions (pct_of_{*}_w_trx) are constructed using a normal distribution

  • Intervals for transactions as a percentage of another column regardless of transaction utilization (pct_of_{*}_all) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:

    Var(S) = E(N) * Var(X) + E(X)**2 * Var(N),

Where S is the aggregate transactions random variable, X is an individual transaction amount assumed to follow a normal distribution, and N is a binomial random variable for transaction utilization.

Default removal of partial exposures

As a default, partial exposures are removed from data before summarizing results. This is done to avoid complexity associated with a lopsided skew in the timing of transactions. For example, if transactions can occur on a monthly basis or annually at the beginning of each policy year, partial exposures may not be appropriate. If a policy had an exposure of 0.5 years and was taking withdrawals annually at the beginning of the year, an argument could be made that the exposure should instead be 1 complete year. If the same policy was expected to take withdrawals 9 months into the year, it’s not clear if the exposure should be 0.5 years or 0.5 / 0.75 years. To override this treatment, set full_exposures_only to False.

Alternative class constructor

TrxStats.from_DataFrame() can be used to coerce a data frame containing pre-aggregated experience into a TrxStats object. This is most useful for working with industry study data where individual exposure records are not available.

Methods

Name Description
from_DataFrame Convert a data frame containing aggregate transaction experience study
plot Plot transaction study results
plot_utilization_rates Plot transaction frequency and severity.
summary Re-summarize transaction experience data
table Tabular transaction study summary

from_DataFrame

trx_stats.TrxStats.from_DataFrame(data, conf_int=False, conf_level=0.95, col_trx_amt='trx_amt', col_trx_n='trx_n', col_trx_flag='trx_flag', col_exposure='exposure', col_percent_of=None, col_percent_of_w_trx=None, col_trx_amt_sq='trx_amt_sq', start_date=date(1900, 1, 1), end_date=None)

Convert a data frame containing aggregate transaction experience study results to the TrxStats class.

from_DataFrame() is most useful for working with aggregate summaries of experience that were not created by actxps where individual policy information is not available. After converting the data to the TrxStats class, summary() can be used to summarize data by any grouping variables, and plot() and table() are available for reporting.

Parameters

Name Type Description Default
data polars.polars.DataFrame | pandas.pandas.DataFrame A DataFrame containing aggregate transaction study results. See the Notes section for required columns that must be present. required
conf_int bool If True, future calls to summary() will include confidence intervals around the observed utilization rates and any percent_of output columns. False
conf_level float Confidence level used for the Limited Fluctuation credibility method and confidence intervals. 0.95
col_trx_amt str Name of the column in data containing transaction amounts. 'trx_amt'
col_trx_n str Name of the column in data containing transaction counts. 'trx_n'
col_trx_flag str Name of the column in data containing the number of exposure records with transactions. 'trx_flag'
col_exposure str Name of the column in data containing exposures. 'exposure'
col_percent_of str Name of the column in data containing a numeric variable to use in “percent of” calculations. None
col_percent_of_w_trx str Name of the column in data containing a numeric variable to use in “percent of” calculations with transactions. None
col_trx_amt_sq str Only required when col_percent_of is passed and conf_int is True. Name of the column in data containing squared transaction amounts. 'trx_amt_sq'
start_date datetime.date | str Transaction study start date date(1900, 1, 1)
end_date datetime.date | str Transaction study end date None

Returns

Type Description
actxps.trx_stats.TrxStats A TrxStats object

Notes

At a minimum, the following columns are required:

  • Transaction amounts (trx_amt)
  • Transaction counts (trx_n)
  • The number of exposure records with transactions (trx_flag). This number is not necessarily equal to transaction counts. If multiple transactions are allowed per exposure period, trx_flag will be less than trx_n.
  • Exposures (exposure)

If transaction amounts should be expressed as a percentage of another variable (i.e. to calculate utilization rates or actual-to-expected ratios), additional columns are required:

  • A denominator “percent of” column. For example, the sum of account values.
  • A denominator “percent of” column for exposure records with transactions. For example, the sum of account values across all records with non-zero transaction amounts.

If confidence intervals are desired and “percent of” columns are passed, an additional column for the sum of squared transaction amounts (trx_amt_sq) is also required.

The names in parentheses above are expected column names. If the data frame passed to from_DataFrame() uses different column names, these can be specified using the col_* arguments.

start_date, and end_date are optional arguments that are only used for printing the resulting TrxStats object.

Unlike ExposedDF.trx_stats(), from_DataFrame() only permits a single transaction type and a single percent_of column.

Examples

# convert pre-aggregated experience into a TrxStats object
import actxps as xp
import polars as pl

agg_sim_dat = xp.load_agg_sim_dat()
dat = xp.TrxStats.from_DataFrame(
    agg_sim_dat.with_columns(pl.col('n').cast(float)),
    col_exposure="n",
    col_trx_amt="wd",
    col_trx_n="wd_n",
    col_trx_flag="wd_flag",
    col_percent_of="av",
    col_percent_of_w_trx="av_w_wd",
    col_trx_amt_sq="wd_sq",
    start_date=2005, end_date=2019,
    conf_int=True)
dat

# summary by policy year
dat.summary('pol_yr')
Transaction study results

Groups: pol_yr
Study range: 2005 to 2019
Transaction types: wd
Transactions as % of: av

shape: (15, 21)
┌────────┬──────────┬───────┬──────────┬───┬─────────────┬─────────────┬─────────────┬─────────────┐
│ pol_yr ┆ trx_type ┆ trx_n ┆ trx_flag ┆ … ┆ pct_of_av_w ┆ pct_of_av_w ┆ pct_of_av_a ┆ pct_of_av_a │
│ ---    ┆ ---      ┆ ---   ┆ ---      ┆   ┆ _trx_lower  ┆ _trx_upper  ┆ ll_lower    ┆ ll_upper    │
│ i64    ┆ str      ┆ i64   ┆ i64      ┆   ┆ ---         ┆ ---         ┆ ---         ┆ ---         │
│        ┆          ┆       ┆          ┆   ┆ f64         ┆ f64         ┆ f64         ┆ f64         │
╞════════╪══════════╪═══════╪══════════╪═══╪═════════════╪═════════════╪═════════════╪═════════════╡
│ 1      ┆ wd       ┆ 16942 ┆ 7921     ┆ … ┆ 0.038276    ┆ 0.041085    ┆ 0.015332    ┆ 0.016587    │
│ 2      ┆ wd       ┆ 16900 ┆ 7923     ┆ … ┆ 0.039902    ┆ 0.043038    ┆ 0.016931    ┆ 0.01839     │
│ 3      ┆ wd       ┆ 16679 ┆ 7796     ┆ … ┆ 0.04103     ┆ 0.044065    ┆ 0.018417    ┆ 0.019919    │
│ 4      ┆ wd       ┆ 16193 ┆ 7575     ┆ … ┆ 0.040991    ┆ 0.044182    ┆ 0.019329    ┆ 0.020971    │
│ 5      ┆ wd       ┆ 15353 ┆ 7153     ┆ … ┆ 0.040572    ┆ 0.043915    ┆ 0.020021    ┆ 0.021806    │
│ …      ┆ …        ┆ …     ┆ …        ┆ … ┆ …           ┆ …           ┆ …           ┆ …           │
│ 11     ┆ wd       ┆ 7470  ┆ 3448     ┆ … ┆ 0.043407    ┆ 0.048177    ┆ 0.027074    ┆ 0.030235    │
│ 12     ┆ wd       ┆ 5079  ┆ 2314     ┆ … ┆ 0.043491    ┆ 0.049206    ┆ 0.028866    ┆ 0.032877    │
│ 13     ┆ wd       ┆ 3310  ┆ 1495     ┆ … ┆ 0.043142    ┆ 0.051286    ┆ 0.029488    ┆ 0.035291    │
│ 14     ┆ wd       ┆ 2119  ┆ 946      ┆ … ┆ 0.044544    ┆ 0.056832    ┆ 0.031782    ┆ 0.040822    │
│ 15     ┆ wd       ┆ 982   ┆ 430      ┆ … ┆ 0.043576    ┆ 0.059197    ┆ 0.033368    ┆ 0.045754    │
└────────┴──────────┴───────┴──────────┴───┴─────────────┴─────────────┴─────────────┴─────────────┘

See Also

ExposedDF.trx_stats() for information on how TrxStats objects are typically created from individual exposure records.

plot

trx_stats.TrxStats.plot(x=None, y='trx_util', color=None, facets=None, mapping=None, scales='fixed', geoms='lines', y_labels=lambda : [f'{v * 100}%' for v in l], y_log10=False, conf_int_bars=False)

Plot transaction study results

Parameters

Name Type Description Default
x str A column name in data to use as the x variable. If None, x will default to the first grouping variable. If there are no grouping variables, x will be set to “All”. None
y str A column name in data to use as the y variable. 'trx_util'
color str A column name in data to use as the color and fill variables. If None, y will default to the second grouping variable. If there are less than two grouping variables, the plot will not use a color aesthetic. None
facets list | str Faceting variables in data passed to plotnine.facet_wrap(). If None, grouping variables 3+ will be used (assuming there are more than two grouping variables). None
mapping plotnine.aes Aesthetic mapping added to plotnine.ggplot(). NOTE: If mapping is supplied, the x, y, and color arguments will be ignored. None
scales str The scales argument passed to plotnine.facet_wrap(). 'fixed'
geoms ‘lines’, ‘bars’, ’points Type of geometry. If “lines” is passed, the plot will display lines and points. If “bars”, the plot will display bars. If “points”, the plot will display points only. 'lines'
y_labels callable Label function passed to plotnine.scale_y_continuous(). lambda l: [f"{v * 100:.1f}%" for v in l]
y_log10 bool If True, the y-axes are plotted on a log-10 scale. False
conf_int_bars bool If True, confidence interval error bars are included in the plot. This option is only available for utilization rates and any pct_of columns. False

Notes

If no aesthetic map is supplied, the plot will use the first grouping variable in the groups property on the x axis and trx_util on the y axis. In addition, the second grouping variable in groups will be used for color and fill.

If no faceting variables are supplied, the plot will use grouping variables 3 and up as facets. These variables are passed into plotnine.facet_wrap().

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)

trx_res = (expo.group_by('pol_yr').
           trx_stats(percent_of='premium'))

trx_res.plot()

<Figure Size: (640 x 480)>

plot_utilization_rates

trx_stats.TrxStats.plot_utilization_rates(**kwargs)

Plot transaction frequency and severity.

Frequency is represented by utilization rates (trx_util). Severity is represented by transaction amounts as a percentage of one or more other columns in the data ({*}_w_trx). All severity series begin with the prefix “pct_of_” and end with the suffix “_w_trx”. The suffix refers to the fact that the denominator only includes records with non-zero transactions. Severity series are based on column names passed to the percent_of argument in trx_stats(). If no “percentage of” columns exist, this function will only plot utilization rates.

Parameters

Name Type Description Default
**kwargs Additional arguments passed to plot() {}

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
account_vals = xp.load_account_vals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)
expo.data = expo.data.join(account_vals, how='left',
                           on=["pol_num", "pol_date_yr"])        

trx_res = (expo.group_by('pol_yr').
           trx_stats(percent_of='av_anniv', combine_trx=True))

trx_res.plot_utilization_rates()

<Figure Size: (640 x 480)>

summary

trx_stats.TrxStats.summary(*by)

Re-summarize transaction experience data

Re-summarize the data while retaining any grouping variables passed to the *by argument.

Parameters

Name Type Description Default
*by Column names in data that will be used as grouping variables in the re-summarized object. Passing nothing is acceptable and will produce a 1-row experience summary. ()

Returns

Type Description
actxps.trx_stats.TrxStats A new TrxStats object with rows for all the unique groups in *by

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)

trx_res = (expo.group_by('inc_guar', 'pol_yr').
           trx_stats(percent_of='premium'))
trx_res.summary('inc_guar')
Transaction study results

Groups: inc_guar
Study range: 1900-01-01 to 2019-12-31
Transaction types: Base, Rider
Transactions as % of: premium

shape: (4, 14)
┌──────────┬──────────┬─────────┬──────────┬───┬──────────┬──────────┬──────────────┬──────────────┐
│ inc_guar ┆ trx_type ┆ trx_n   ┆ trx_flag ┆ … ┆ trx_freq ┆ trx_util ┆ pct_of_premi ┆ pct_of_premi │
│ ---      ┆ ---      ┆ ---     ┆ ---      ┆   ┆ ---      ┆ ---      ┆ um_all       ┆ um_w_trx     │
│ bool     ┆ str      ┆ f64     ┆ u32      ┆   ┆ f64      ┆ f64      ┆ ---          ┆ ---          │
│          ┆          ┆         ┆          ┆   ┆          ┆          ┆ f64          ┆ f64          │
╞══════════╪══════════╪═════════╪══════════╪═══╪══════════╪══════════╪══════════════╪══════════════╡
│ false    ┆ Base     ┆ 52939.0 ┆ 24703    ┆ … ┆ 2.143019 ┆ 0.504782 ┆ 0.014557     ┆ 0.028089     │
│ false    ┆ Rider    ┆ 0.0     ┆ 0        ┆ … ┆ NaN      ┆ 0.0      ┆ 0.0          ┆ NaN          │
│ true     ┆ Base     ┆ 7561.0  ┆ 3521     ┆ … ┆ 2.147401 ┆ 0.0468   ┆ 0.0014       ┆ 0.028594     │
│ true     ┆ Rider    ┆ 77321.0 ┆ 35941    ┆ … ┆ 2.151331 ┆ 0.477716 ┆ 0.028166     ┆ 0.059362     │
└──────────┴──────────┴─────────┴──────────┴───┴──────────┴──────────┴──────────────┴──────────────┘

table

trx_stats.TrxStats.table(fontsize=100, decimals=1, colorful=True, color_util='GnBu', color_pct_of='RdBu', show_conf_int=False, decimals_amt=0, suffix_amt=False, **rename_cols)

Tabular transaction study summary

Convert transaction study results to a presentation-friendly format.

Parameters

Name Type Description Default
fontsize int Font size percentage multiplier 100
decimals int Number of decimals to display for percentages 1
colorful bool If True, color will be added to the the observed utilization rate and “percentage of” columns. True
color_util str ColorBrewer palette used for the observed utilization rate. 'GnBu'
color_pct_of str ColorBrewer palette used for “percentage of” columns. 'RdBu'
show_conf_int bool If True any confidence intervals will be displayed. False
decimals_amt int Number of decimals to display for amount columns (transaction counts, total transactions, and average transactions) 0
suffix_amt bool This argument has the same meaning as the compact argument in great_tables.gt.GT.fmt_number() for amount columns. If False, no scaling or suffixing are applied to amount columns. If True, all amount columns are automatically scaled and suffixed by “K” (thousands), “M” (millions), “B” (billions), or “T” (trillions). False
rename_cols str Key-value pairs where keys are column names and values are labels that will appear on the output table. This parameter is useful for renaming grouping variables that will appear under their original variable names if left unchanged. None

Notes

Further customizations can be added using great_tables.gt.GT methods. See the great_tables package documentation for more information.

Returns

Type Description
great_tables.great_tables.gt.great_tables.gt.GT A formatted HTML table

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)

trx_res = (expo.group_by('pol_yr').
           trx_stats(percent_of='premium'))

trx_res.table()
Transaction Study Results
Transaction types: Base, Rider
Counts Amount Averages Frequency Utilization % of premium
Total Periods w/ trx all w/ trx all
Base
1 7,447 3,514 119,877 34 6 2.1 19.0% 2.5% 0.5%
2 7,274 3,422 116,967 34 7 2.1 20.2% 2.5% 0.5%
3 7,061 3,309 116,357 35 8 2.1 21.5% 2.6% 0.6%
4 6,596 3,080 114,987 37 8 2.1 22.3% 2.8% 0.6%
5 6,093 2,847 109,918 39 9 2.1 23.3% 2.8% 0.7%
6 5,543 2,572 97,455 38 9 2.2 24.0% 2.8% 0.7%
7 4,921 2,297 92,797 40 10 2.1 24.7% 2.9% 0.7%
8 4,200 1,964 85,740 44 11 2.1 25.2% 3.1% 0.8%
9 3,579 1,655 70,715 43 11 2.2 26.0% 3.0% 0.8%
10 3,004 1,376 57,935 42 11 2.2 27.2% 3.0% 0.8%
11 2,428 1,115 53,809 48 14 2.2 28.6% 3.4% 1.0%
12 1,320 605 30,425 50 13 2.2 26.7% 3.2% 0.9%
13 700 319 17,809 56 14 2.2 24.9% 3.3% 0.9%
14 315 141 8,883 63 15 2.2 24.6% 3.6% 1.0%
15 19 8 225 28 11 2.4 38.1% 3.0% 1.0%
Rider
1 8,077 3,778 265,312 70 14 2.1 20.4% 5.3% 1.1%
2 8,232 3,834 288,114 75 17 2.1 22.6% 5.7% 1.3%
3 8,204 3,817 294,795 77 19 2.1 24.8% 5.8% 1.4%
4 7,960 3,715 283,763 76 21 2.1 26.9% 5.8% 1.5%
5 7,536 3,521 264,939 75 22 2.1 28.8% 5.7% 1.6%
6 7,118 3,342 264,516 79 25 2.1 31.2% 5.9% 1.8%
7 6,631 3,097 251,502 81 27 2.1 33.3% 6.1% 2.0%
8 5,952 2,773 226,770 82 29 2.1 35.6% 6.1% 2.1%
9 5,173 2,406 204,000 85 32 2.2 37.8% 6.3% 2.3%
10 4,331 1,998 174,457 87 35 2.2 39.5% 6.5% 2.5%
11 3,468 1,597 139,620 87 36 2.2 41.0% 6.5% 2.6%
12 2,398 1,076 90,469 84 40 2.2 47.4% 6.2% 2.8%
13 1,486 662 62,880 95 49 2.2 51.7% 6.7% 3.2%
14 736 316 30,648 97 53 2.3 55.1% 7.1% 3.6%
15 19 9 944 105 45 2.1 42.9% 9.0% 4.2%
Study range: 1900-01-01 to 2019-12-31