ExposedDF

expose.ExposedDF(self, data, end_date, start_date=date(1900, 1, 1), target_status=None, cal_expo=False, expo_length='year', col_pol_num='pol_num', col_status='status', col_issue_date='issue_date', col_term_date='term_date', default_status=None)

Exposed data frame class

Convert a data frame of census-level records into an object with exposure-level records.

Parameters

Name	Type	Description	Default
`data`	polars.polars.DataFrame \| pandas.pandas.DataFrame	A data frame with census-level records	required
`end_date`	datetime.date \| str	Experience study end date. If a string is passed, it must be in %Y-%m-%d format.	required
`start_date`	datetime.date \| str	Experience study start date. If a string is passed, it must be in %Y-%m-%d format.	`date(1900, 1, 1)`
`target_status`	str \| list \| numpy.numpy.ndarray	Target status values	`None`
`cal_expo`	bool	Set to `True` for calendar year exposures. Otherwise policy year exposures are assumed.	`False`
`expo_length`	(year, quarter, month, week)	Exposure period length	`'year'`
`col_pol_num`	str	Name of the column in `data` containing the policy number	`'pol_num'`
`col_status`	str	name of the column in `data` containing the policy status	`'status'`
`col_issue_date`	str	name of the column in `data` containing the issue date	`'issue_date'`
`col_term_date`	str	name of the column in `data` containing the termination date	`'term_date'`
`default_status`	str	Default active status code. If `None`, the most common status is assumed.	`None`

Attributes

Name	Type	Description
data	polars.polars.DataFrame	A Polars data frame with exposure level records. The results include all existing columns in the original input data plus new columns for exposures and observation periods. Observation periods include counters for policy exposures, start dates, and end dates. Both start dates and end dates are inclusive bounds. For policy year exposures, two observation period columns are returned. Columns beginning with (`pol_`) are integer policy periods. Columns beginning with (`pol_date_`) are calendar dates representing anniversary dates, monthiversary dates, etc.
end_date, start_date, target_status, cal_expo, expo_length, default_status		Values passed on class instantiation. See Parameters for definitions.
exposure_type	str	A description of the exposure type that combines the `cal_expo` and `expo_length` properties
date_cols	tuple	Names of the start and end date columns in `data` for each exposure period
trx_types	list	List of transaction types that have been attached to `data` using the `add_transactions()` method.

Notes

Census-level data refers to a data set wherein there is one row per unique policy. Exposure-level data expands census-level data such that there is one record per policy per observation period. Observation periods could be any meaningful period of time such as a policy year, policy month, calendar year, calendar quarter, calendar month, etc.

target_status is used in the calculation of exposures. The annual exposure method is applied, which allocates a full period of exposure for any statuses in target_status. For all other statuses, new entrants and exits are partially exposed based on the time elapsed in the observation period. This method is consistent with the Balducci Hypothesis, which assumes that the probability of termination is proportionate to the time elapsed in the observation period. If the annual exposure method isn’t desired, target_status can be ignored. In this case, partial exposures are always applied regardless of status.

default_status is used to indicate the default active status that should be used when exposure records are created. If None, then the most common status will be assumed.

Alternative class constructors

expose_py(), expose_pq(), expose_pm(), expose_pw(), expose_cy(), expose_cq(), expose_cm(), expose_cw()

Convenience constructor functions for specific exposure calculations. The two characters after the underscore describe the exposure type and exposure period, respectively. For exposures types p refers to policy years c refers to calendar years For exposure periods y = years q = quarters m = months w = weeks Each constructor has the same inputs as the __init__ method except that expo_length and cal_expo arguments are prepopulated.
from_DataFrame() Convert a data frame that already has exposure-level records into an ExposedDF object.

References

Atkinson and McGarry (2016). Experience Study Calculations

https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf

Examples

import actxps as xp

xp.ExposedDF(xp.load_toy_census(), "2020-12-31", 
             target_status='Surrender')

Exposure data

Exposure type: policy_year
Target status: Surrender
Study range: 1900-01-01 to 2020-12-31

shape: (33, 8)
┌─────────┬────────┬────────────┬───────────┬────────┬─────────────┬─────────────────┬──────────┐
│ pol_num ┆ status ┆ issue_date ┆ term_date ┆ pol_yr ┆ pol_date_yr ┆ pol_date_yr_end ┆ exposure │
│ ---     ┆ ---    ┆ ---        ┆ ---       ┆ ---    ┆ ---         ┆ ---             ┆ ---      │
│ i64     ┆ enum   ┆ date       ┆ date      ┆ u32    ┆ date        ┆ date            ┆ f64      │
╞═════════╪════════╪════════════╪═══════════╪════════╪═════════════╪═════════════════╪══════════╡
│ 1       ┆ Active ┆ 2010-01-01 ┆ null      ┆ 1      ┆ 2010-01-01  ┆ 2010-12-31      ┆ 1.0      │
│ 1       ┆ Active ┆ 2010-01-01 ┆ null      ┆ 2      ┆ 2011-01-01  ┆ 2011-12-31      ┆ 1.0      │
│ 1       ┆ Active ┆ 2010-01-01 ┆ null      ┆ 3      ┆ 2012-01-01  ┆ 2012-12-31      ┆ 1.0      │
│ 1       ┆ Active ┆ 2010-01-01 ┆ null      ┆ 4      ┆ 2013-01-01  ┆ 2013-12-31      ┆ 1.0      │
│ 1       ┆ Active ┆ 2010-01-01 ┆ null      ┆ 5      ┆ 2014-01-01  ┆ 2014-12-31      ┆ 1.0      │
│ …       ┆ …      ┆ …          ┆ …         ┆ …      ┆ …           ┆ …               ┆ …        │
│ 3       ┆ Active ┆ 2009-11-10 ┆ null      ┆ 8      ┆ 2016-11-10  ┆ 2017-11-09      ┆ 1.0      │
│ 3       ┆ Active ┆ 2009-11-10 ┆ null      ┆ 9      ┆ 2017-11-10  ┆ 2018-11-09      ┆ 1.0      │
│ 3       ┆ Active ┆ 2009-11-10 ┆ null      ┆ 10     ┆ 2018-11-10  ┆ 2019-11-09      ┆ 1.0      │
│ 3       ┆ Active ┆ 2009-11-10 ┆ null      ┆ 11     ┆ 2019-11-10  ┆ 2020-11-09      ┆ 1.0      │
│ 3       ┆ Active ┆ 2009-11-10 ┆ null      ┆ 12     ┆ 2020-11-10  ┆ 2021-11-09      ┆ 0.142466 │
└─────────┴────────┴────────────┴───────────┴────────┴─────────────┴─────────────────┴──────────┘

Methods

Name	Description
add_transactions	Add transactions to an experience study
exp_stats	Summarize experience study records
expose_cm	Create an `ExposedDF` with calendar month exposures
expose_cq	Create an `ExposedDF` with calendar quarter exposures
expose_cw	Create an `ExposedDF` with calendar week exposures
expose_cy	Create an `ExposedDF` with calendar year exposures
expose_pm	Create an `ExposedDF` with policy month exposures
expose_pq	Create an `ExposedDF` with policy quarter exposures
expose_pw	Create an `ExposedDF` with policy week exposures
expose_py	Create an `ExposedDF` with policy year exposures
expose_split	Split calendar exposures by policy year
from_DataFrame	Coerce a data frame to an `ExposedDF` object
group_by	Set grouping variables for summary methods like `exp_stats()` and
trx_stats	Summarize transactions and utilization rates
ungroup	Remove all grouping variables for summary methods like `exp_stats()`

add_transactions

expose.ExposedDF.add_transactions(trx_data, col_pol_num='pol_num', col_trx_date='trx_date', col_trx_type='trx_type', col_trx_amt='trx_amt')

Add transactions to an experience study

Parameters

Name	Type	Description	Default
`trx_data`	polars.polars.DataFrame \| pandas.pandas.DataFrame	A data frame containing transactions details. This data frame must have columns for policy numbers, transaction dates, transaction types, and transaction amounts.	required
`col_pol_num`	str	Name of the column in `trx_data` containing the policy number	`'pol_num'`
`col_trx_date`	str	Name of the column in `trx_data` containing the transaction date	`'trx_date'`
`col_trx_type`	str	Name of the column in `trx_data` containing the transaction type	`'trx_type'`
`col_trx_amt`	str	Name of the column in `trx_data` containing the transaction amount	`'trx_amt'`

Notes

This function attaches transactions to an ExposedDF object. Transactions are grouped and summarized such that the number of rows in the data does not change. Two columns are added to the output for each transaction type. These columns have names of the pattern trx_n_{*} (transaction counts) and trx_amt_{*} (transaction_amounts). The trx_types property is updated to include the new transaction types found in trx_data.

Transactions are associated with the data object by matching transactions dates with exposure dates ranges found in the ExposedDF.

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)

Exposure data

Exposure type: policy_year
Target status: Surrender
Study range: 1900-01-01 to 2019-12-31
Transaction types: Base, Rider

shape: (141_252, 19)
┌─────────┬────────┬────────────┬──────────┬───┬────────────┬────────────┬────────────┬────────────┐
│ pol_num ┆ status ┆ issue_date ┆ inc_guar ┆ … ┆ trx_n_Base ┆ trx_n_Ride ┆ trx_amt_Ba ┆ trx_amt_Ri │
│ ---     ┆ ---    ┆ ---        ┆ ---      ┆   ┆ ---        ┆ r          ┆ se         ┆ der        │
│ i64     ┆ enum   ┆ date       ┆ bool     ┆   ┆ i32        ┆ ---        ┆ ---        ┆ ---        │
│         ┆        ┆            ┆          ┆   ┆            ┆ i32        ┆ f64        ┆ f64        │
╞═════════╪════════╪════════════╪══════════╪═══╪════════════╪════════════╪════════════╪════════════╡
│ 1       ┆ Active ┆ 2014-12-17 ┆ true     ┆ … ┆ 0          ┆ 0          ┆ 0.0        ┆ 0.0        │
│ 1       ┆ Active ┆ 2014-12-17 ┆ true     ┆ … ┆ 0          ┆ 0          ┆ 0.0        ┆ 0.0        │
│ 1       ┆ Active ┆ 2014-12-17 ┆ true     ┆ … ┆ 0          ┆ 0          ┆ 0.0        ┆ 0.0        │
│ 1       ┆ Active ┆ 2014-12-17 ┆ true     ┆ … ┆ 0          ┆ 0          ┆ 0.0        ┆ 0.0        │
│ 1       ┆ Active ┆ 2014-12-17 ┆ true     ┆ … ┆ 0          ┆ 0          ┆ 0.0        ┆ 0.0        │
│ …       ┆ …      ┆ …          ┆ …        ┆ … ┆ …          ┆ …          ┆ …          ┆ …          │
│ 20000   ┆ Active ┆ 2009-04-29 ┆ true     ┆ … ┆ 0          ┆ 1          ┆ 0.0        ┆ 547.0      │
│ 20000   ┆ Active ┆ 2009-04-29 ┆ true     ┆ … ┆ 0          ┆ 1          ┆ 0.0        ┆ 106.0      │
│ 20000   ┆ Active ┆ 2009-04-29 ┆ true     ┆ … ┆ 0          ┆ 1          ┆ 0.0        ┆ 31.0       │
│ 20000   ┆ Active ┆ 2009-04-29 ┆ true     ┆ … ┆ 0          ┆ 1          ┆ 0.0        ┆ 75.0       │
│ 20000   ┆ Active ┆ 2009-04-29 ┆ true     ┆ … ┆ 0          ┆ 1          ┆ 0.0        ┆ 466.0      │
└─────────┴────────┴────────────┴──────────┴───┴────────────┴────────────┴────────────┴────────────┘

exp_stats

expose.ExposedDF.exp_stats(target_status=None, expected=None, wt=None, conf_int=False, credibility=False, conf_level=0.95, cred_r=0.05, col_exposure='exposure')

Summarize experience study records

Create a summary of termination experience for a given target status (an ExpStats object).

Parameters

Name	Type	Description	Default
`target_status`	str \| list \| numpy.numpy.ndarray	A single string, list, or array of target status values	`None`
`expected`	str \| list \| numpy.numpy.ndarray	A single string, list, or array of column names in the `data` property with expected values	`None`
`wt`	str	Name of the column in the `data` property containing weights to use in the calculation of claims, exposures, and partial credibility.	`None`
`conf_int`	bool	If `True`, the output will include confidence intervals around the observed termination rates and any actual-to-expected ratios.	`False`
`credibility`	bool	Whether the output should include partial credibility weights and credibility-weighted decrement rates.	`False`
`conf_level`	float	Confidence level under the Limited Fluctuation credibility method	`0.95`
`cred_r`	float	Error tolerance under the Limited Fluctuation credibility method	`0.05`
`col_exposure`	str	Name of the column in `data` containing exposures. Only necessary for `SplitExposedDF` objects.	`'exposure'`

Notes

If the ExposedDF object is grouped (see the group_by() method), the returned ExpStats object’s data will contain one row per group.

If nothing is passed to target_status, the target_status property of the ExposedDF object will be used. If that property is None, all status values except the first level will be assumed. This will produce a warning message.

Expected values

The expected argument is optional. If provided, this argument must be a string, list, or array with values corresponding to columns in the data property containing expected experience. More than one expected basis can be provided.

Confidence intervals

If conf_int is set to True, the output will contain lower and upper confidence interval limits for the observed termination rate and any actual-to-expected ratios. The confidence level is dictated by conf_level. If no weighting variable is passed to wt, confidence intervals will be constructed assuming a binomial distribution of claims. Otherwise, confidence intervals will be calculated assuming that the aggregate claims distribution is normal with a mean equal to observed claims and a variance equal to:

Var(S) = E(N) * Var(X) + E(X)**2 * Var(N),

Where S is the aggregate claim random variable, X is the weighting variable assumed to follow a normal distribution, and N is a binomial random variable for the number of claims.

If credibility is True and expected values are passed to expected, the output will also contain confidence intervals for any credibility-weighted termination rates.

Credibility

If credibility is set to True, the output will contain a credibility column equal to the partial credibility estimate under the Limited Fluctuation credibility method (also known as Classical Credibility) assuming a binomial distribution of claims.

Returns

Type Description

ExpStats An ExpStats object with a data property that includes columns for any grouping variables, claims, exposures, and observed decrement rates (q_obs). If any values are passed to expected, additional columns will be added for expected decrements and actual-to-expected ratios. If credibility is set to True, additional columns are added for partial credibility and credibility-weighted decrement rates (assuming values are passed to expected). If conf_int is set to True, additional columns are added for lower and upper confidence interval limits around the observed termination rates and any actual-to-expected ratios. Additionally, if credibility is True and expected values are passed to expected, the output will contain confidence intervals around credibility-weighted termination rates. Confidence interval columns include the name of the original output column suffixed by either _lower or _upper. If a value is passed to wt, additional columns are created containing the the sum of weights (.weight), the sum of squared weights (.weight_qs), and the number of records (.weight_n).

Type	Description
`ExpStats`	An `ExpStats` object with a `data` property that includes columns for any grouping variables, claims, exposures, and observed decrement rates (`q_obs`). If any values are passed to `expected`, additional columns will be added for expected decrements and actual-to-expected ratios. If `credibility` is set to `True`, additional columns are added for partial credibility and credibility-weighted decrement rates (assuming values are passed to `expected`). If `conf_int` is set to `True`, additional columns are added for lower and upper confidence interval limits around the observed termination rates and any actual-to-expected ratios. Additionally, if `credibility` is `True` and expected values are passed to `expected`, the output will contain confidence intervals around credibility-weighted termination rates. Confidence interval columns include the name of the original output column suffixed by either `_lower` or `_upper`. If a value is passed to `wt`, additional columns are created containing the the sum of weights (`.weight`), the sum of squared weights (`.weight_qs`), and the number of records (`.weight_n`).

References

Herzog, Thomas (1999). Introduction to Credibility Theory

Examples

import actxps as xp

(xp.ExposedDF(xp.load_census_dat(),
              "2019-12-31", 
              target_status="Surrender").
    group_by('pol_yr', 'inc_guar').
    exp_stats(conf_int=True))

Experience study results

Groups: pol_yr, inc_guar
Target status: Surrender
Study range: 1900-01-01 to 2019-12-31

shape: (30, 8)
┌────────┬──────────┬──────────┬────────┬──────────────┬──────────┬─────────────┬─────────────┐
│ pol_yr ┆ inc_guar ┆ n_claims ┆ claims ┆ exposure     ┆ q_obs    ┆ q_obs_lower ┆ q_obs_upper │
│ ---    ┆ ---      ┆ ---      ┆ ---    ┆ ---          ┆ ---      ┆ ---         ┆ ---         │
│ u32    ┆ bool     ┆ u32      ┆ u32    ┆ f64          ┆ f64      ┆ f64         ┆ f64         │
╞════════╪══════════╪══════════╪════════╪══════════════╪══════════╪═════════════╪═════════════╡
│ 1      ┆ false    ┆ 56       ┆ 56     ┆ 7719.80545   ┆ 0.007254 ┆ 0.005441    ┆ 0.009197    │
│ 1      ┆ true     ┆ 46       ┆ 46     ┆ 11532.402336 ┆ 0.003989 ┆ 0.002862    ┆ 0.005203    │
│ 2      ┆ false    ┆ 92       ┆ 92     ┆ 7102.810869  ┆ 0.012953 ┆ 0.010418    ┆ 0.015628    │
│ 2      ┆ true     ┆ 68       ┆ 68     ┆ 10611.955805 ┆ 0.006408 ┆ 0.0049      ┆ 0.00801     │
│ 3      ┆ false    ┆ 67       ┆ 67     ┆ 6446.913856  ┆ 0.010393 ┆ 0.008066    ┆ 0.012874    │
│ …      ┆ …        ┆ …        ┆ …      ┆ …            ┆ …        ┆ …           ┆ …           │
│ 13     ┆ true     ┆ 49       ┆ 49     ┆ 1117.137361  ┆ 0.043862 ┆ 0.032225    ┆ 0.056394    │
│ 14     ┆ false    ┆ 33       ┆ 33     ┆ 262.622262   ┆ 0.125656 ┆ 0.087578    ┆ 0.167541    │
│ 14     ┆ true     ┆ 29       ┆ 29     ┆ 609.216476   ┆ 0.047602 ┆ 0.031188    ┆ 0.065658    │
│ 15     ┆ false    ┆ 8        ┆ 8      ┆ 74.046456    ┆ 0.10804  ┆ 0.040515    ┆ 0.18907     │
│ 15     ┆ true     ┆ 9        ┆ 9      ┆ 194.128602   ┆ 0.046361 ┆ 0.020605    ┆ 0.077268    │
└────────┴──────────┴──────────┴────────┴──────────────┴──────────┴─────────────┴─────────────┘

expose_cm

expose.ExposedDF.expose_cm(data, end_date, **kwargs)

Create an ExposedDF with calendar month exposures

expose_cq

expose.ExposedDF.expose_cq(data, end_date, **kwargs)

Create an ExposedDF with calendar quarter exposures

expose_cw

expose.ExposedDF.expose_cw(data, end_date, **kwargs)

Create an ExposedDF with calendar week exposures

expose_cy

expose.ExposedDF.expose_cy(data, end_date, **kwargs)

Create an ExposedDF with calendar year exposures

expose_pm

expose.ExposedDF.expose_pm(data, end_date, **kwargs)

Create an ExposedDF with policy month exposures

expose_pq

expose.ExposedDF.expose_pq(data, end_date, **kwargs)

Create an ExposedDF with policy quarter exposures

expose_pw

expose.ExposedDF.expose_pw(data, end_date, **kwargs)

Create an ExposedDF with policy week exposures

expose_py

expose.ExposedDF.expose_py(data, end_date, **kwargs)

Create an ExposedDF with policy year exposures

expose_split

expose.ExposedDF.expose_split()

Split calendar exposures by policy year

Split calendar period exposures that cross a policy anniversary into a pre-anniversary record and a post-anniversary record.

Returns

Type	Description
SplitExposedDF	A subclass of ExposedDF with calendar period exposures split by policy year.

Notes

The ExposedDF must have calendar year, quarter, month, or week exposure records. Calendar year exposures are created by passing cal_expo=True to ExposedDF (or alternatively, with the class methods ExposedDF.expose_cy(), ExposedDF.expose_cq(), ExposedDF.expose_cm(), and ExposedDF.expose_cw()).

After splitting, the resulting data will contain both calendar exposures and policy year exposures. These columns will be named ‘exposure_cal’ and ‘exposure_pol’, respectively. Calendar exposures will be in the original units passed to SplitExposedDF(). Policy exposures will always be expressed in years. Downstream functions like exp_stats() and exp_shiny() will require clarification as to which exposure basis should be used to summarize results.

After splitting, the column ‘pol_yr’ will contain policy years.

Examples

import actxps as xp
toy_census = xp.load_toy_census()
expo = xp.ExposedDF.expose_cy(toy_census, "2022-12-31")
expo.expose_split()

Exposure data

Exposure type: split_year
Target status: None
Study range: 1900-01-01 to 2022-12-31

shape: (58, 9)
┌─────────┬───────────┬────────────┬────────────┬───┬────────────┬────────┬────────────┬───────────┐
│ pol_num ┆ status    ┆ issue_date ┆ term_date  ┆ … ┆ cal_yr_end ┆ pol_yr ┆ exposure_c ┆ exposure_ │
│ ---     ┆ ---       ┆ ---        ┆ ---        ┆   ┆ ---        ┆ ---    ┆ al         ┆ pol       │
│ i64     ┆ enum      ┆ date       ┆ date       ┆   ┆ date       ┆ i32    ┆ ---        ┆ ---       │
│         ┆           ┆            ┆            ┆   ┆            ┆        ┆ f64        ┆ f64       │
╞═════════╪═══════════╪════════════╪════════════╪═══╪════════════╪════════╪════════════╪═══════════╡
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ … ┆ 2010-12-31 ┆ 1      ┆ 1.0        ┆ 1.0       │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ … ┆ 2011-12-31 ┆ 2      ┆ 1.0        ┆ 1.0       │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ … ┆ 2012-12-31 ┆ 3      ┆ 1.0        ┆ 1.0       │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ … ┆ 2013-12-31 ┆ 4      ┆ 1.0        ┆ 1.0       │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ … ┆ 2014-12-31 ┆ 5      ┆ 1.0        ┆ 1.0       │
│ …       ┆ …         ┆ …          ┆ …          ┆ … ┆ …          ┆ …      ┆ …          ┆ …         │
│ 3       ┆ Active    ┆ 2009-11-10 ┆ null       ┆ … ┆ 2020-11-09 ┆ 11     ┆ 0.857923   ┆ 0.857923  │
│ 3       ┆ Active    ┆ 2009-11-10 ┆ null       ┆ … ┆ 2020-12-31 ┆ 12     ┆ 0.142077   ┆ 0.142466  │
│ 3       ┆ Active    ┆ 2009-11-10 ┆ null       ┆ … ┆ 2021-11-09 ┆ 12     ┆ 0.857534   ┆ 0.857534  │
│ 3       ┆ Active    ┆ 2009-11-10 ┆ null       ┆ … ┆ 2021-12-31 ┆ 13     ┆ 0.142466   ┆ 0.142466  │
│ 3       ┆ Surrender ┆ 2009-11-10 ┆ 2022-02-25 ┆ … ┆ 2022-11-09 ┆ 13     ┆ 0.153425   ┆ 0.153425  │
└─────────┴───────────┴────────────┴────────────┴───┴────────────┴────────┴────────────┴───────────┘

from_DataFrame

expose.ExposedDF.from_DataFrame(data, end_date, start_date=date(1900, 1, 1), target_status=None, cal_expo=False, expo_length='year', trx_types=None, col_pol_num='pol_num', col_status='status', col_exposure='exposure', col_pol_per=None, cols_dates=None, col_trx_n_='trx_n_', col_trx_amt_='trx_amt_', default_status=None)

Coerce a data frame to an ExposedDF object

The input data frame must have columns for policy numbers, statuses, exposures, policy periods (for policy exposures only), and exposure start / end dates. Optionally, if data has transaction counts and amounts by type, these can be specified without calling add_transactions().

Parameters

Name	Type	Description	Default
`data`	polars.polars.DataFrame \| pandas.pandas.DataFrame	A data frame with exposure-level records	required
`end_date`	datetime.date \| str	Experience study end date	required
`start_date`	datetime.date \| str	Experience study start date	`date(1900, 1, 1)`
`target_status`	str \| list \| numpy.numpy.ndarray	Target status values	`None`
`cal_expo`	bool	Set to `True` for calendar year exposures. Otherwise policy year exposures are assumed.	`False`
`expo_length`	str	Exposure period length. Must be ‘year’, ‘quarter’, ‘month’, or ‘week’	`'year'`
`trx_types`	list \| str	List containing unique transaction types that have been attached to `data`. For each value in `trx_types`, `from_DataFrame` requires that columns exist in `data` named `trx_n_{}` and `trx_amt_{}` containing transaction counts and amounts, respectively. The prefixes “trx_n_” and “trx_amt_” can be overridden using the `col_trx_n_` and `col_trx_amt_` arguments.	`None`
`col_pol_num`	str	Name of the column in `data` containing the policy number	`'pol_num'`
`col_status`	str	name of the column in `data` containing the policy status	`'status'`
`col_exposure`	str	Name of the column in `data` containing exposures.	`'exposure'`
`col_pol_per`	str	Name of the column in `data` containing policy exposure periods. Only necessary if `cal_expo` is `False`. The assumed default is either “pol_yr”, “pol_qtr”, “pol_mth”, or “pol_wk” depending on the value of `expo_length`.	`None`
`cols_dates`	str	Names of the columns in `data` containing exposure start and end dates. Both date ranges are assumed to be exclusive. The assumed default is of the form A_B. A is “cal” if `cal_expo` is `True` or “pol” otherwise. B is either “yr”, “qtr”, “mth”, or “wk” depending on the value of `expo_length`.	`None`
`col_trx_n_`	str	Prefix to use for columns containing transaction counts.	`"trx_n_"`
`col_trx_amt_`	str	Prefix to use for columns containing transaction amounts.	`"trx_amt_"`
`default_status`	str	Default active status code	`None`

Returns

Type	Description
actxps.expose.ExposedDF	An `ExposedDF` object.

group_by

expose.ExposedDF.group_by(*by)

Set grouping variables for summary methods like exp_stats() and trx_stats().

Parameters

Name	Type	Description	Default
`*by`		Column names in `data` that will be used as grouping variables	`()`

Notes

This function will not directly apply the DataFrame.group_by() method to the data property. Instead, it will set the groups property of the ExposedDF object. The groups property is subsequently used to group data within summary methods like exp_stats() and trx_stats().

trx_stats

expose.ExposedDF.trx_stats(trx_types=None, percent_of=None, combine_trx=False, full_exposures_only=True, conf_int=False, conf_level=0.95, col_exposure='exposure')

Summarize transactions and utilization rates

Create a summary of transaction counts, amounts, and utilization rates (a TrxStats object).

Parameters

Name	Type	Description	Default
`trx_types`	list or str	A list of transaction types to include in the output. If `None` is provided, all available transaction types in the `trx_types` property will be used.	`None`
`percent_of`	list or str	A list containing column names in the `data` property to use as denominators in the calculation of utilization rates or actual-to-expected ratios.	`None`
`combine_trx`	bool	If `False` (default), the results will contain output rows for each transaction type. If `True`, the results will contains aggregated results across all transaction types.	`False`
`full_exposures_only`	bool	If `True` (default), partially exposed records will be ignored in the results.	`True`
`conf_int`	bool	If `True`, the output will include confidence intervals around the observed utilization rate and any `percent_of` output columns.	`False`
`conf_level`	float	Confidence level for confidence intervals	`0.95`
`col_exposure`	str	Name of the column in the `data` property containing exposures. Only necessary for `SplitExposedDF` objects.	`'exposure'`

Notes

If the ExposedDF object is grouped (see the group_by() method), the returned TrxStats object’s data will contain one row per group.

Any number of transaction types can be passed to the trx_types argument, however each transaction type must appear in the trx_types property of the ExposedDF object. In addition, trx_stats() expects to see columns named trx_n_{*} (for transaction counts) and trx_amt_{*} for (transaction amounts) for each transaction type. To ensure data is in the appropriate format, use the class method ExposedDF.from_DataFrame() to convert an existing data frame with transactions or use add_transactions() to attach transactions to an existing ExposedDF object.

“Percentage of” calculations

The percent_of argument is optional. If provided, this argument must be list with values corresponding to columns in the data property containing values to use as denominators in the calculation of utilization rates or actual-to-expected ratios. Example usage:

In a study of partial withdrawal transactions, if percent_of refers to account values, observed withdrawal rates can be determined.
In a study of recurring claims, if percent_of refers to a column containing a maximum benefit amount, utilization rates can be determined.

Confidence intervals

If conf_int is set to True, the output will contain lower and upper confidence interval limits for the observed utilization rate and any percent_of output columns. The confidence level is dictated by conf_level.

Intervals for the utilization rate (trx_util) assume a binomial distribution.
Intervals for transactions as a percentage of another column with non-zero transactions (pct_of_{*}_w_trx) are constructed using a normal distribution
Intervals for transactions as a percentage of another column regardless of transaction utilization (pct_of_{*}_all) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:

Var(S) = E(N) * Var(X) + E(X)**2 * Var(N),

Where S is the aggregate transactions random variable, X is an individual transaction amount assumed to follow a normal distribution, and N is a binomial random variable for transaction utilization.

Default removal of partial exposures

As a default, partial exposures are removed from data before summarizing results. This is done to avoid complexity associated with a lopsided skew in the timing of transactions. For example, if transactions can occur on a monthly basis or annually at the beginning of each policy year, partial exposures may not be appropriate. If a policy had an exposure of 0.5 years and was taking withdrawals annually at the beginning of the year, an argument could be made that the exposure should instead be 1 complete year. If the same policy was expected to take withdrawals 9 months into the year, it’s not clear if the exposure should be 0.5 years or 0.5 / 0.75 years. To override this treatment, set full_exposures_only to False.

Returns

Type Description

TrxStats A TrxStats object with a data property that includes columns for any grouping variables and transaction types, plus the following: - trx_n: the number of unique transactions. - trx_amt: total transaction amount - trx_flag: the number of observation periods with non-zero transaction amounts. - exposure: total exposures - avg_trx: mean transaction amount (trx_amt / trx_flag) - avg_all: mean transaction amount over all records (trx_amt / exposure) - trx_freq: transaction frequency when a transaction occurs (trx_n / trx_flag) - trx_utilization: transaction utilization per observation period (trx_flag / exposure) If percent_of is provided, the results will also include: - The sum of any columns passed to percent_of with non-zero transactions. These columns include the suffix _w_trx. - The sum of any columns passed to percent_of - pct_of_{*}_w_trx: total transactions as a percentage of column {*}_w_trx - pct_of_{*}_all: total transactions as a percentage of column {*} If conf_int is set to True, additional columns are added for lower and upper confidence interval limits around the observed utilization rate and any percent_of output columns. Confidence interval columns include the name of the original output column suffixed by either _lower or _upper. If values are passed to percent_of, an additional column is created containing the the sum of squared transaction amounts (trx_amt_sq).

Type	Description
`TrxStats`	A `TrxStats` object with a `data` property that includes columns for any grouping variables and transaction types, plus the following: - `trx_n`: the number of unique transactions. - `trx_amt`: total transaction amount - `trx_flag`: the number of observation periods with non-zero transaction amounts. - `exposure`: total exposures - `avg_trx`: mean transaction amount (`trx_amt / trx_flag`) - `avg_all`: mean transaction amount over all records (`trx_amt / exposure`) - `trx_freq`: transaction frequency when a transaction occurs (`trx_n / trx_flag`) - `trx_utilization`: transaction utilization per observation period (`trx_flag / exposure`) If `percent_of` is provided, the results will also include: - The sum of any columns passed to `percent_of` with non-zero transactions. These columns include the suffix `_w_trx`. - The sum of any columns passed to `percent_of` - `pct_of_{}_w_trx`: total transactions as a percentage of column `{}_w_trx` - `pct_of_{}_all`: total transactions as a percentage of column `{}` If `conf_int` is set to `True`, additional columns are added for lower and upper confidence interval limits around the observed utilization rate and any `percent_of` output columns. Confidence interval columns include the name of the original output column suffixed by either `_lower` or `_upper`. If values are passed to `percent_of`, an additional column is created containing the the sum of squared transaction amounts (`trx_amt_sq`).

Examples

import actxps as xp
census = xp.load_census_dat()
withdrawals = xp.load_withdrawals()
expo = xp.ExposedDF.expose_py(census, "2019-12-31",
                              target_status="Surrender")
expo.add_transactions(withdrawals)

expo.group_by('inc_guar').trx_stats(percent_of="premium",
                                    combine_trx=True,
                                    conf_int=True)

Transaction study results

Groups: inc_guar
Study range: 1900-01-01 to 2019-12-31
Transaction types: Base, Rider
Transactions as % of: premium

shape: (2, 21)
┌──────────┬──────────┬─────────┬──────────┬───┬────────────┬────────────┬────────────┬────────────┐
│ inc_guar ┆ trx_type ┆ trx_n   ┆ trx_flag ┆ … ┆ pct_of_pre ┆ pct_of_pre ┆ pct_of_pre ┆ pct_of_pre │
│ ---      ┆ ---      ┆ ---     ┆ ---      ┆   ┆ mium_w_trx ┆ mium_w_trx ┆ mium_all_l ┆ mium_all_u │
│ bool     ┆ str      ┆ f64     ┆ u32      ┆   ┆ _lower     ┆ _upper     ┆ ower       ┆ pper       │
│          ┆          ┆         ┆          ┆   ┆ ---        ┆ ---        ┆ ---        ┆ ---        │
│          ┆          ┆         ┆          ┆   ┆ f64        ┆ f64        ┆ f64        ┆ f64        │
╞══════════╪══════════╪═════════╪══════════╪═══╪════════════╪════════════╪════════════╪════════════╡
│ false    ┆ All      ┆ 52939.0 ┆ 24703    ┆ … ┆ 0.027557   ┆ 0.028621   ┆ 0.014253   ┆ 0.014861   │
│ true     ┆ All      ┆ 84882.0 ┆ 39462    ┆ … ┆ 0.055607   ┆ 0.057363   ┆ 0.029064   ┆ 0.030067   │
└──────────┴──────────┴─────────┴──────────┴───┴────────────┴────────────┴────────────┴────────────┘

ungroup

expose.ExposedDF.ungroup()

Remove all grouping variables for summary methods like exp_stats() and trx_stats().

Parameters

Attributes

Notes

References

Examples

Methods

add_transactions

Parameters

Notes

Examples

exp_stats

Parameters

Notes

Returns

References

Examples

expose_cm

expose_cq

expose_cw

expose_cy

expose_pm

expose_pq

expose_pw

expose_py

expose_split

Returns

Notes

Examples

See Also

from_DataFrame

Parameters

Returns

group_by

Parameters

Notes

trx_stats

Parameters

Notes

Returns

Examples

ungroup