Convert a data frame of census-level records into an object with exposure-level records.
Parameters
Name
Type
Description
Default
data
polars.polars.DataFrame | pandas.pandas.DataFrame
A data frame with census-level records
required
end_date
datetime.date | str
Experience study end date. If a string is passed, it must be in %Y-%m-%d format.
required
start_date
datetime.date | str
Experience study start date. If a string is passed, it must be in %Y-%m-%d format.
date(1900, 1, 1)
target_status
str | list | numpy.numpy.ndarray
Target status values
None
cal_expo
bool
Set to True for calendar year exposures. Otherwise policy year exposures are assumed.
False
expo_length
(year, quarter, month, week)
Exposure period length
'year'
col_pol_num
str
Name of the column in data containing the policy number
'pol_num'
col_status
str
name of the column in data containing the policy status
'status'
col_issue_date
str
name of the column in data containing the issue date
'issue_date'
col_term_date
str
name of the column in data containing the termination date
'term_date'
default_status
str
Default active status code. If None, the most common status is assumed.
None
Attributes
Name
Type
Description
data
polars.polars.DataFrame
A Polars data frame with exposure level records. The results include all existing columns in the original input data plus new columns for exposures and observation periods. Observation periods include counters for policy exposures, start dates, and end dates. Both start dates and end dates are inclusive bounds. For policy year exposures, two observation period columns are returned. Columns beginning with (pol_) are integer policy periods. Columns beginning with (pol_date_) are calendar dates representing anniversary dates, monthiversary dates, etc.
Values passed on class instantiation. See Parameters for definitions.
exposure_type
str
A description of the exposure type that combines the cal_expo and expo_length properties
date_cols
tuple
Names of the start and end date columns in data for each exposure period
trx_types
list
List of transaction types that have been attached to data using the add_transactions() method.
Notes
Census-level data refers to a data set wherein there is one row per unique policy. Exposure-level data expands census-level data such that there is one record per policy per observation period. Observation periods could be any meaningful period of time such as a policy year, policy month, calendar year, calendar quarter, calendar month, etc.
target_status is used in the calculation of exposures. The annual exposure method is applied, which allocates a full period of exposure for any statuses in target_status. For all other statuses, new entrants and exits are partially exposed based on the time elapsed in the observation period. This method is consistent with the Balducci Hypothesis, which assumes that the probability of termination is proportionate to the time elapsed in the observation period. If the annual exposure method isn’t desired, target_status can be ignored. In this case, partial exposures are always applied regardless of status.
default_status is used to indicate the default active status that should be used when exposure records are created. If None, then the most common status will be assumed.
Convenience constructor functions for specific exposure calculations. The two characters after the underscore describe the exposure type and exposure period, respectively. For exposures types p refers to policy years c refers to calendar years For exposure periods y = years q = quarters m = months w = weeks Each constructor has the same inputs as the __init__ method except that expo_length and cal_expo arguments are prepopulated.
from_DataFrame() Convert a data frame that already has exposure-level records into an ExposedDF object.
References
Atkinson and McGarry (2016). Experience Study Calculations
A data frame containing transactions details. This data frame must have columns for policy numbers, transaction dates, transaction types, and transaction amounts.
required
col_pol_num
str
Name of the column in trx_data containing the policy number
'pol_num'
col_trx_date
str
Name of the column in trx_data containing the transaction date
'trx_date'
col_trx_type
str
Name of the column in trx_data containing the transaction type
'trx_type'
col_trx_amt
str
Name of the column in trx_data containing the transaction amount
'trx_amt'
Notes
This function attaches transactions to an ExposedDF object. Transactions are grouped and summarized such that the number of rows in the data does not change. Two columns are added to the output for each transaction type. These columns have names of the pattern trx_n_{*} (transaction counts) and trx_amt_{*} (transaction_amounts). The trx_types property is updated to include the new transaction types found in trx_data.
Transactions are associated with the data object by matching transactions dates with exposure dates ranges found in the ExposedDF.
Create a summary of termination experience for a given target status (an ExpStats object).
Parameters
Name
Type
Description
Default
target_status
str | list | numpy.numpy.ndarray
A single string, list, or array of target status values
None
expected
str | list | numpy.numpy.ndarray
A single string, list, or array of column names in the data property with expected values
None
wt
str
Name of the column in the data property containing weights to use in the calculation of claims, exposures, and partial credibility.
None
conf_int
bool
If True, the output will include confidence intervals around the observed termination rates and any actual-to-expected ratios.
False
credibility
bool
Whether the output should include partial credibility weights and credibility-weighted decrement rates.
False
conf_level
float
Confidence level under the Limited Fluctuation credibility method
0.95
cred_r
float
Error tolerance under the Limited Fluctuation credibility method
0.05
col_exposure
str
Name of the column in data containing exposures. Only necessary for SplitExposedDF objects.
'exposure'
Notes
If the ExposedDF object is grouped (see the group_by() method), the returned ExpStats object’s data will contain one row per group.
If nothing is passed to target_status, the target_status property of the ExposedDF object will be used. If that property is None, all status values except the first level will be assumed. This will produce a warning message.
Expected values
The expected argument is optional. If provided, this argument must be a string, list, or array with values corresponding to columns in the data property containing expected experience. More than one expected basis can be provided.
Confidence intervals
If conf_int is set to True, the output will contain lower and upper confidence interval limits for the observed termination rate and any actual-to-expected ratios. The confidence level is dictated by conf_level. If no weighting variable is passed to wt, confidence intervals will be constructed assuming a binomial distribution of claims. Otherwise, confidence intervals will be calculated assuming that the aggregate claims distribution is normal with a mean equal to observed claims and a variance equal to:
Var(S) = E(N) * Var(X) + E(X)**2 * Var(N),
Where S is the aggregate claim random variable, X is the weighting variable assumed to follow a normal distribution, and N is a binomial random variable for the number of claims.
If credibility is True and expected values are passed to expected, the output will also contain confidence intervals for any credibility-weighted termination rates.
Credibility
If credibility is set to True, the output will contain a credibility column equal to the partial credibility estimate under the Limited Fluctuation credibility method (also known as Classical Credibility) assuming a binomial distribution of claims.
Returns
Type
Description
ExpStats
An ExpStats object with a data property that includes columns for any grouping variables, claims, exposures, and observed decrement rates (q_obs). If any values are passed to expected, additional columns will be added for expected decrements and actual-to-expected ratios. If credibility is set to True, additional columns are added for partial credibility and credibility-weighted decrement rates (assuming values are passed to expected). If conf_int is set to True, additional columns are added for lower and upper confidence interval limits around the observed termination rates and any actual-to-expected ratios. Additionally, if credibility is True and expected values are passed to expected, the output will contain confidence intervals around credibility-weighted termination rates. Confidence interval columns include the name of the original output column suffixed by either _lower or _upper. If a value is passed to wt, additional columns are created containing the the sum of weights (.weight), the sum of squared weights (.weight_qs), and the number of records (.weight_n).
References
Herzog, Thomas (1999). Introduction to Credibility Theory
Examples
import actxps as xp(xp.ExposedDF(xp.load_census_dat(),"2019-12-31", target_status="Surrender"). group_by('pol_yr', 'inc_guar'). exp_stats(conf_int=True))
Split calendar period exposures that cross a policy anniversary into a pre-anniversary record and a post-anniversary record.
Returns
Type
Description
SplitExposedDF
A subclass of ExposedDF with calendar period exposures split by policy year.
Notes
The ExposedDF must have calendar year, quarter, month, or week exposure records. Calendar year exposures are created by passing cal_expo=True to ExposedDF (or alternatively, with the class methods ExposedDF.expose_cy(), ExposedDF.expose_cq(), ExposedDF.expose_cm(), and ExposedDF.expose_cw()).
After splitting, the resulting data will contain both calendar exposures and policy year exposures. These columns will be named ‘exposure_cal’ and ‘exposure_pol’, respectively. Calendar exposures will be in the original units passed to SplitExposedDF(). Policy exposures will always be expressed in years. Downstream functions like exp_stats() and exp_shiny() will require clarification as to which exposure basis should be used to summarize results.
After splitting, the column ‘pol_yr’ will contain policy years.
Examples
import actxps as xptoy_census = xp.load_toy_census()expo = xp.ExposedDF.expose_cy(toy_census, "2022-12-31")expo.expose_split()
The input data frame must have columns for policy numbers, statuses, exposures, policy periods (for policy exposures only), and exposure start / end dates. Optionally, if data has transaction counts and amounts by type, these can be specified without calling add_transactions().
Parameters
Name
Type
Description
Default
data
polars.polars.DataFrame | pandas.pandas.DataFrame
A data frame with exposure-level records
required
end_date
datetime.date | str
Experience study end date
required
start_date
datetime.date | str
Experience study start date
date(1900, 1, 1)
target_status
str | list | numpy.numpy.ndarray
Target status values
None
cal_expo
bool
Set to True for calendar year exposures. Otherwise policy year exposures are assumed.
False
expo_length
str
Exposure period length. Must be ‘year’, ‘quarter’, ‘month’, or ‘week’
'year'
trx_types
list | str
List containing unique transaction types that have been attached to data. For each value in trx_types, from_DataFrame requires that columns exist in data named trx_n_{*} and trx_amt_{*} containing transaction counts and amounts, respectively. The prefixes “trx_n_” and “trx_amt_” can be overridden using the col_trx_n_ and col_trx_amt_ arguments.
None
col_pol_num
str
Name of the column in data containing the policy number
'pol_num'
col_status
str
name of the column in data containing the policy status
'status'
col_exposure
str
Name of the column in data containing exposures.
'exposure'
col_pol_per
str
Name of the column in data containing policy exposure periods. Only necessary if cal_expo is False. The assumed default is either “pol_yr”, “pol_qtr”, “pol_mth”, or “pol_wk” depending on the value of expo_length.
None
cols_dates
str
Names of the columns in data containing exposure start and end dates. Both date ranges are assumed to be exclusive. The assumed default is of the form A_B. A is “cal” if cal_expo is True or “pol” otherwise. B is either “yr”, “qtr”, “mth”, or “wk” depending on the value of expo_length.
None
col_trx_n_
str
Prefix to use for columns containing transaction counts.
"trx_n_"
col_trx_amt_
str
Prefix to use for columns containing transaction amounts.
"trx_amt_"
default_status
str
Default active status code
None
Returns
Type
Description
actxps.expose.ExposedDF
An ExposedDF object.
group_by
expose.ExposedDF.group_by(*by)
Set grouping variables for summary methods like exp_stats() and trx_stats().
Parameters
Name
Type
Description
Default
*by
Column names in data that will be used as grouping variables
()
Notes
This function will not directly apply the DataFrame.group_by() method to the data property. Instead, it will set the groups property of the ExposedDF object. The groups property is subsequently used to group data within summary methods like exp_stats() and trx_stats().
Create a summary of transaction counts, amounts, and utilization rates (a TrxStats object).
Parameters
Name
Type
Description
Default
trx_types
list or str
A list of transaction types to include in the output. If None is provided, all available transaction types in the trx_types property will be used.
None
percent_of
list or str
A list containing column names in the data property to use as denominators in the calculation of utilization rates or actual-to-expected ratios.
None
combine_trx
bool
If False (default), the results will contain output rows for each transaction type. If True, the results will contains aggregated results across all transaction types.
False
full_exposures_only
bool
If True (default), partially exposed records will be ignored in the results.
True
conf_int
bool
If True, the output will include confidence intervals around the observed utilization rate and any percent_of output columns.
False
conf_level
float
Confidence level for confidence intervals
0.95
col_exposure
str
Name of the column in the data property containing exposures. Only necessary for SplitExposedDF objects.
'exposure'
Notes
If the ExposedDF object is grouped (see the group_by() method), the returned TrxStats object’s data will contain one row per group.
Any number of transaction types can be passed to the trx_types argument, however each transaction type must appear in the trx_types property of the ExposedDF object. In addition, trx_stats() expects to see columns named trx_n_{*} (for transaction counts) and trx_amt_{*} for (transaction amounts) for each transaction type. To ensure data is in the appropriate format, use the class method ExposedDF.from_DataFrame() to convert an existing data frame with transactions or use add_transactions() to attach transactions to an existing ExposedDF object.
“Percentage of” calculations
The percent_of argument is optional. If provided, this argument must be list with values corresponding to columns in the data property containing values to use as denominators in the calculation of utilization rates or actual-to-expected ratios. Example usage:
In a study of partial withdrawal transactions, if percent_of refers to account values, observed withdrawal rates can be determined.
In a study of recurring claims, if percent_of refers to a column containing a maximum benefit amount, utilization rates can be determined.
Confidence intervals
If conf_int is set to True, the output will contain lower and upper confidence interval limits for the observed utilization rate and any percent_of output columns. The confidence level is dictated by conf_level.
Intervals for the utilization rate (trx_util) assume a binomial distribution.
Intervals for transactions as a percentage of another column with non-zero transactions (pct_of_{*}_w_trx) are constructed using a normal distribution
Intervals for transactions as a percentage of another column regardless of transaction utilization (pct_of_{*}_all) are calculated assuming that the aggregate distribution is normal with a mean equal to observed transactions and a variance equal to:
Var(S) = E(N) * Var(X) + E(X)**2 * Var(N),
Where S is the aggregate transactions random variable, X is an individual transaction amount assumed to follow a normal distribution, and N is a binomial random variable for transaction utilization.
Default removal of partial exposures
As a default, partial exposures are removed from data before summarizing results. This is done to avoid complexity associated with a lopsided skew in the timing of transactions. For example, if transactions can occur on a monthly basis or annually at the beginning of each policy year, partial exposures may not be appropriate. If a policy had an exposure of 0.5 years and was taking withdrawals annually at the beginning of the year, an argument could be made that the exposure should instead be 1 complete year. If the same policy was expected to take withdrawals 9 months into the year, it’s not clear if the exposure should be 0.5 years or 0.5 / 0.75 years. To override this treatment, set full_exposures_only to False.
Returns
Type
Description
TrxStats
A TrxStats object with a data property that includes columns for any grouping variables and transaction types, plus the following: - trx_n: the number of unique transactions. - trx_amt: total transaction amount - trx_flag: the number of observation periods with non-zero transaction amounts. - exposure: total exposures - avg_trx: mean transaction amount (trx_amt / trx_flag) - avg_all: mean transaction amount over all records (trx_amt / exposure) - trx_freq: transaction frequency when a transaction occurs (trx_n / trx_flag) - trx_utilization: transaction utilization per observation period (trx_flag / exposure) If percent_of is provided, the results will also include: - The sum of any columns passed to percent_of with non-zero transactions. These columns include the suffix _w_trx. - The sum of any columns passed to percent_of - pct_of_{*}_w_trx: total transactions as a percentage of column {*}_w_trx - pct_of_{*}_all: total transactions as a percentage of column {*} If conf_int is set to True, additional columns are added for lower and upper confidence interval limits around the observed utilization rate and any percent_of output columns. Confidence interval columns include the name of the original output column suffixed by either _lower or _upper. If values are passed to percent_of, an additional column is created containing the the sum of squared transaction amounts (trx_amt_sq).