import actxps as xp
import polars as pl
toy_census = xp.load_toy_census()
toy_census| pol_num | status | issue_date | term_date | 
|---|---|---|---|
| i64 | cat | date | date | 
| 1 | "Active" | 2010-01-01 | null | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 
| 3 | "Surrender" | 2009-11-10 | 2022-02-25 | 
Census-level data refers to a data set wherein there is one row per policy. Exposure-level data expands census-level data such that there is one record per policy per observation period. Observation periods could be any meaningful period of time such as a policy year, policy month, calendar year, calendar quarter, calendar month, etc.
A common step in experience studies is converting census-level data into exposure-level data. The ExposedDF class performs this task. Specifically, this class:
NA for all periods except the last.If you already have exposure-level data available, the class method ExposedDF.from_DataFrame() can be used to convert a data frame into an ExposedDF object.
To get started, we’re going to use a toy census data frame from the actxps package that contains 3 policies: one active, one that terminated due to death, and one that terminated due to surrender.
toy_census contains the 4 columns necessary to compute exposures:
pol_num: a unique identifier for individual policiesstatus: the policy statusissue_date: issue dateterm_date: termination date, if any. Otherwise NA| pol_num | status | issue_date | term_date | 
|---|---|---|---|
| i64 | cat | date | date | 
| 1 | "Active" | 2010-01-01 | null | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 
| 3 | "Surrender" | 2009-11-10 | 2022-02-25 | 
toy_census is a Polars data frame. Actxps functions accept both Polars and Pandas data frames. For speed and efficiency reasons, Polars is used internally for all data wrangling, so if a Pandas data frame is passed to an actxps function it will be converted to Polars. To convert a Polars data frame to Pandas the method DataFrame.to_pandas() is available.
Let’s assume we’re performing an experience study as of 2022-12-31 and we’re interested in policy year exposures. Here’s what we should expect for our 3 policies.
To calculate exposures, we pass our data to ExposedDF() and we specify a study end_date.
This creates a new ExposedDF object, which contains a data property and additional attributes related to the experience study.
Let’s examine what happened to each policy.
Policy 1: As expected, there are 13 rows for this policy. New columns were added for the policy year (pol_yr), date ranges (pol_date_yr, pol_date_yr_end), and exposure. All exposures are 100% since this policy was active for all 13 years.
When the data is printed, additional attributes from the ExposedDF class are displayed.
| pol_num | status | issue_date | term_date | pol_yr | pol_date_yr | pol_date_yr_end | exposure | 
|---|---|---|---|---|---|---|---|
| i64 | enum | date | date | u32 | date | date | f64 | 
| 1 | "Active" | 2010-01-01 | null | 1 | 2010-01-01 | 2010-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 2 | 2011-01-01 | 2011-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 3 | 2012-01-01 | 2012-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 4 | 2013-01-01 | 2013-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 5 | 2014-01-01 | 2014-12-31 | 1.0 | 
| … | … | … | … | … | … | … | … | 
| 1 | "Active" | 2010-01-01 | null | 9 | 2018-01-01 | 2018-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 10 | 2019-01-01 | 2019-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 11 | 2020-01-01 | 2020-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 12 | 2021-01-01 | 2021-12-31 | 1.0 | 
| 1 | "Active" | 2010-01-01 | null | 13 | 2022-01-01 | 2022-12-31 | 1.0 | 
Policy 2: There are 10 rows for this policy. The first 9 periods show the policy in an active status and the termination date (term_date) is set to NA. The last period includes the final status of “Death” and the actual termination date. The last exposure is less than one because roughly a third of a year elapsed between the last anniversary date on 2020-05-27 and the termination date on 2020-09-14.
| pol_num | status | issue_date | term_date | pol_yr | pol_date_yr | pol_date_yr_end | exposure | 
|---|---|---|---|---|---|---|---|
| i64 | enum | date | date | u32 | date | date | f64 | 
| 2 | "Active" | 2011-05-27 | null | 1 | 2011-05-27 | 2012-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2 | 2012-05-27 | 2013-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 3 | 2013-05-27 | 2014-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 4 | 2014-05-27 | 2015-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 5 | 2015-05-27 | 2016-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 6 | 2016-05-27 | 2017-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 7 | 2017-05-27 | 2018-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 8 | 2018-05-27 | 2019-05-26 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 9 | 2019-05-27 | 2020-05-26 | 1.0 | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 10 | 2020-05-27 | 2021-05-26 | 0.30411 | 
Policy 3: There are 13 rows for this policy. The first 12 periods show the policy in an active status and the termination date (term_date) is set to NA. The last period includes the final status of “Surrender” and the actual termination date. The last exposure is less than one because the roughly a third of a year elapsed between the last anniversary date on 2021-11-10 and the termination date on 2022-02-25.
| pol_num | status | issue_date | term_date | pol_yr | pol_date_yr | pol_date_yr_end | exposure | 
|---|---|---|---|---|---|---|---|
| i64 | enum | date | date | u32 | date | date | f64 | 
| 3 | "Active" | 2009-11-10 | null | 1 | 2009-11-10 | 2010-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 2 | 2010-11-10 | 2011-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 3 | 2011-11-10 | 2012-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 4 | 2012-11-10 | 2013-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 5 | 2013-11-10 | 2014-11-09 | 1.0 | 
| … | … | … | … | … | … | … | … | 
| 3 | "Active" | 2009-11-10 | null | 9 | 2017-11-10 | 2018-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 10 | 2018-11-10 | 2019-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 11 | 2019-11-10 | 2020-11-09 | 1.0 | 
| 3 | "Active" | 2009-11-10 | null | 12 | 2020-11-10 | 2021-11-09 | 1.0 | 
| 3 | "Surrender" | 2009-11-10 | 2022-02-25 | 13 | 2021-11-10 | 2022-11-09 | 0.29589 | 
The previous section only supplied data and a study end_date to ExposedDF(). This is the minimum required arguments for the function. Optionally, a start_date can be supplied that will drop exposure periods that begin before a specified date.
Exposure data
Exposure type: policy_year
Target status: None
Study range: 2019-12-31 to 2022-12-31
shape: (6, 8)
┌─────────┬───────────┬────────────┬────────────┬────────┬─────────────┬────────────────┬──────────┐
│ pol_num ┆ status    ┆ issue_date ┆ term_date  ┆ pol_yr ┆ pol_date_yr ┆ pol_date_yr_en ┆ exposure │
│ ---     ┆ ---       ┆ ---        ┆ ---        ┆ ---    ┆ ---         ┆ d              ┆ ---      │
│ i64     ┆ enum      ┆ date       ┆ date       ┆ u32    ┆ date        ┆ ---            ┆ f64      │
│         ┆           ┆            ┆            ┆        ┆             ┆ date           ┆          │
╞═════════╪═══════════╪════════════╪════════════╪════════╪═════════════╪════════════════╪══════════╡
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ 11     ┆ 2020-01-01  ┆ 2020-12-31     ┆ 1.0      │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ 12     ┆ 2021-01-01  ┆ 2021-12-31     ┆ 1.0      │
│ 1       ┆ Active    ┆ 2010-01-01 ┆ null       ┆ 13     ┆ 2022-01-01  ┆ 2022-12-31     ┆ 1.0      │
│ 2       ┆ Death     ┆ 2011-05-27 ┆ 2020-09-14 ┆ 10     ┆ 2020-05-27  ┆ 2021-05-26     ┆ 0.30411  │
│ 3       ┆ Active    ┆ 2009-11-10 ┆ null       ┆ 12     ┆ 2020-11-10  ┆ 2021-11-09     ┆ 1.0      │
│ 3       ┆ Surrender ┆ 2009-11-10 ┆ 2022-02-25 ┆ 13     ┆ 2021-11-10  ┆ 2022-11-09     ┆ 0.29589  │
└─────────┴───────────┴────────────┴────────────┴────────┴─────────────┴────────────────┴──────────┘Most experience studies use the annual exposure method which allocates a full period of exposure for the particular termination event of interest in the scope of the study.
The intuition for this approach is simple: let’s assume we have an unrealistically small study with a single data point for one policy over the course of one year. Let’s assume that policy terminated due to surrender half way through the year.
If we don’t apply the annual exposure method, we would calculate a termination rate as:
\[ q^{surr} = \frac{claims}{exposures} = \frac{1}{0.5} = 200\% \]
A termination rate of 200% doesn’t make any sense. Under the annual exposure method we would see a rate of 100%, which is intuitive.
\[ q^{surr} = \frac{claims}{exposures} = \frac{1}{1} = 100\% \]
The annual exposure method can be applied by passing a character vector of target statuses to the ExposedDF() class.
Let’s assume we are performing a surrender study.
Now let’s verify that the exposure on the surrendered policy increased to 100% in the last exposure period.
| pol_num | status | issue_date | term_date | pol_yr | pol_date_yr | pol_date_yr_end | exposure | 
|---|---|---|---|---|---|---|---|
| i64 | enum | date | date | u32 | date | date | f64 | 
| 1 | "Active" | 2010-01-01 | null | 13 | 2022-01-01 | 2022-12-31 | 1.0 | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 10 | 2020-05-27 | 2021-05-26 | 0.30411 | 
| 3 | "Surrender" | 2009-11-10 | 2022-02-25 | 13 | 2021-11-10 | 2022-11-09 | 1.0 | 
The default exposure basis used by ExposedDF() is policy years. Using the arguments cal_expo and expo_length other exposure periods can be used.
If cal_expo is set to True, calendar year exposures will be calculated.
Looking at the second policy, we can see that the first year is left-censored because the policy was issued two-fifths of the way through the year, and the last period is right-censored because the policy terminated roughly seven-tenths of the way through the year.
exposed_cal = xp.ExposedDF(toy_census, end_date="2022-12-31",
                           cal_expo=True, target_status="Surrender")
exposed_cal.data.filter(pl.col('pol_num') == 2)| pol_num | status | issue_date | term_date | cal_yr | cal_yr_end | exposure | 
|---|---|---|---|---|---|---|
| i64 | enum | date | date | date | date | f64 | 
| 2 | "Active" | 2011-05-27 | null | 2011-01-01 | 2011-12-31 | 0.6 | 
| 2 | "Active" | 2011-05-27 | null | 2012-01-01 | 2012-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2013-01-01 | 2013-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2014-01-01 | 2014-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2015-01-01 | 2015-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2016-01-01 | 2016-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2017-01-01 | 2017-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2018-01-01 | 2018-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2019-01-01 | 2019-12-31 | 1.0 | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 2020-01-01 | 2020-12-31 | 0.704918 | 
The length of the exposure period can be decreased by passing "quarter", "month", or "week" to the expo_length argument. This can be used with policy or calendar-based exposures.
(xp.ExposedDF(toy_census, end_date="2022-12-31", cal_expo=True,
              expo_length="quarter", target_status="Surrender").
              data.filter(pl.col('pol_num') == 2))| pol_num | status | issue_date | term_date | cal_qtr | cal_qtr_end | exposure | 
|---|---|---|---|---|---|---|
| i64 | enum | date | date | date | date | f64 | 
| 2 | "Active" | 2011-05-27 | null | 2011-04-01 | 2011-06-30 | 0.384615 | 
| 2 | "Active" | 2011-05-27 | null | 2011-07-01 | 2011-09-30 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2011-10-01 | 2011-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2012-01-01 | 2012-03-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2012-04-01 | 2012-06-30 | 1.0 | 
| … | … | … | … | … | … | … | 
| 2 | "Active" | 2011-05-27 | null | 2019-07-01 | 2019-09-30 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2019-10-01 | 2019-12-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2020-01-01 | 2020-03-31 | 1.0 | 
| 2 | "Active" | 2011-05-27 | null | 2020-04-01 | 2020-06-30 | 1.0 | 
| 2 | "Death" | 2011-05-27 | 2020-09-14 | 2020-07-01 | 2020-09-30 | 0.826087 | 
The following functions are class methods of ExposedDF() that target a specific exposure type without specifying cal_expo and expo_length.
ExposedDF.expose_py() = exposures by policy yearExposedDF.expose_pq() = exposures by policy quarterExposedDF.expose_pm() = exposures by policy monthExposedDF.expose_pw() = exposures by policy weekExposedDF.expose_cy() = exposures by calendar yearExposedDF.expose_cq() = exposures by calendar quarterExposedDF.expose_cm() = exposures by calendar monthExposedDF.expose_cw() = exposures by calendar weekA common technique used in experience studies is to split calendar years into two records: a pre-anniversary record and a post-anniversary record. In actxps, this can be accomplished using the expose_split() method.
Let’s continue examining the second policy. exposed_cal, which contains calendar year exposures, is passed into expose_split(). The resulting data now contains 19 records instead of 10. There is one record for 2011 and 2 records for all other years. The year 2011 only has a single record because the policy was issued in this year, so there can only be a post-anniversary record.
split = exposed_cal.expose_split()
(split.data.
 filter(pl.col('pol_num') == 2).
 select('cal_yr', 'cal_yr_end', 'pol_yr', 'exposure_pol', 'exposure_cal'))| cal_yr | cal_yr_end | pol_yr | exposure_pol | exposure_cal | 
|---|---|---|---|---|
| date | date | i32 | f64 | f64 | 
| 2011-05-27 | 2011-12-31 | 1 | 0.598361 | 0.6 | 
| 2012-01-01 | 2012-05-26 | 1 | 0.401639 | 0.401639 | 
| 2012-05-27 | 2012-12-31 | 2 | 0.6 | 0.598361 | 
| 2013-01-01 | 2013-05-26 | 2 | 0.4 | 0.4 | 
| 2013-05-27 | 2013-12-31 | 3 | 0.6 | 0.6 | 
| … | … | … | … | … | 
| 2018-05-27 | 2018-12-31 | 8 | 0.6 | 0.6 | 
| 2019-01-01 | 2019-05-26 | 8 | 0.4 | 0.4 | 
| 2019-05-27 | 2019-12-31 | 9 | 0.598361 | 0.6 | 
| 2020-01-01 | 2020-05-26 | 9 | 0.401639 | 0.401639 | 
| 2020-05-27 | 2020-12-31 | 10 | 0.30411 | 0.303279 | 
The output of expose_split() contains two exposure columns.
exposure_pol contains policy year exposuresexposure_cal contains calendar year exposuresThe two exposure bases will often not match for two reasons:
Calendar years and policy years have different start and end dates that may or may not include a leap day. In the first row, the calendar year exposure is 0.6 years of the year 2011, which does not include a leap day. In the second row, the policy year exposure is 0.5984 years of the policy year spanning 2011-05-27 to 2012-05-26, which does include a leap day.
Application of the annual exposure method. If the termination event of interest appears on a post-anniversary record, policy exposures will be 1 and calendar exposures will be the fraction of the year spanning the anniversary to December 31st. Conversely, if the termination event of interest appears on a pre-anniversary record, calendar exposures will be 1 and policy exposures will be the fraction of the policy year from January 1st to the last day of the current policy year. While it may sound confusing at first, these rules are important to ensure that the termination event of interest always has an exposure of 1 when the data is grouped on a calendar year or policy year basis.
Some downstream methods like exp_stats() expect ExposedDF objects to have a single column for exposures. For split exposures, the exposure basis must be specified using the col_exposure argument.
A `SplitExposedDF` was passed without clarifying which exposure basis should be used to summarize results. Hint: Pass "exposure_pol" to `col_exposure` for policy year exposures pass "exposure_cal" to `col_exposure` for calendar exposures.Experience study results
Target status: Surrender
Study range: 1900-01-01 to 2022-12-31
shape: (1, 4)
┌──────────┬────────┬──────────┬──────────┐
│ n_claims ┆ claims ┆ exposure ┆ q_obs    │
│ ---      ┆ ---    ┆ ---      ┆ ---      │
│ u32      ┆ u32    ┆ f64      ┆ f64      │
╞══════════╪════════╪══════════╪══════════╡
│ 1        ┆ 1      ┆ 35.30411 ┆ 0.028325 │
└──────────┴────────┴──────────┴──────────┘expose_split() doesn’t just work with calendar year exposures. Calendar quarters, months, or weeks can also be split. For periods shorter than a year, a record is only split into pre- and post-anniversary segments if a policy anniversary appears in the middle of the period.
(xp.ExposedDF.expose_cq(toy_census, "2022-12-31",
                        target_status="Surrender").
    expose_split().
    data.filter(pl.col('pol_num') == 2).
    select('cal_qtr', 'cal_qtr_end', 'pol_yr', 'exposure_pol', 'exposure_cal'))| cal_qtr | cal_qtr_end | pol_yr | exposure_pol | exposure_cal | 
|---|---|---|---|---|
| date | date | i32 | f64 | f64 | 
| 2011-05-27 | 2011-06-30 | 1 | 0.095628 | 0.384615 | 
| 2011-07-01 | 2011-09-30 | 1 | 0.251366 | 1.0 | 
| 2011-10-01 | 2011-12-31 | 1 | 0.251366 | 1.0 | 
| 2012-01-01 | 2012-03-31 | 1 | 0.248634 | 1.0 | 
| 2012-04-01 | 2012-05-26 | 1 | 0.153005 | 0.615385 | 
| … | … | … | … | … | 
| 2019-10-01 | 2019-12-31 | 9 | 0.251366 | 1.0 | 
| 2020-01-01 | 2020-03-31 | 9 | 0.248634 | 1.0 | 
| 2020-04-01 | 2020-05-26 | 9 | 0.153005 | 0.615385 | 
| 2020-05-27 | 2020-06-30 | 10 | 0.09589 | 0.384615 | 
| 2020-07-01 | 2020-09-30 | 10 | 0.208219 | 0.826087 | 
Note, however, that calendar period exposures will always be expressed in the original units and policy exposures will always be expressed in years. Above, calendar exposures are quarters whereas policy exposures are years.
As a default, ExposedDF() assumes the census data frame uses the following naming conventions:
pol_numstatusissue_dateterm_dateThese default names can be overridden using the col_pol_num, col_status, col_issue_date, and col_term_date arguments.
For example, if the policy number column was called id in our census-level data, we could write:
If the census-level data contains other policy attributes like plan type or policy values, they will be broadcast across all exposure periods. Depending on the nature of the data, this may or may not be desirable. Constant policy attributes like plan type make sense to broadcast, but numeric values may or may not depending on the circumstances.
toy_census2 = toy_census.clone().with_columns(
    plan_type=pl.Series(["X", "Y", "Z"]),
    policy_value=pl.Series([100, 125, 90])
)
(xp.ExposedDF(toy_census2, end_date="2022-12-31", target_status="Surrender").
     data.select('pol_num', 'status', 'pol_yr', 'exposure',
                 'plan_type', 'policy_value'))| pol_num | status | pol_yr | exposure | plan_type | policy_value | 
|---|---|---|---|---|---|
| i64 | enum | u32 | f64 | str | i64 | 
| 1 | "Active" | 1 | 1.0 | "X" | 100 | 
| 1 | "Active" | 2 | 1.0 | "X" | 100 | 
| 1 | "Active" | 3 | 1.0 | "X" | 100 | 
| 1 | "Active" | 4 | 1.0 | "X" | 100 | 
| 1 | "Active" | 5 | 1.0 | "X" | 100 | 
| … | … | … | … | … | … | 
| 3 | "Active" | 9 | 1.0 | "Z" | 90 | 
| 3 | "Active" | 10 | 1.0 | "Z" | 90 | 
| 3 | "Active" | 11 | 1.0 | "Z" | 90 | 
| 3 | "Active" | 12 | 1.0 | "Z" | 90 | 
| 3 | "Surrender" | 13 | 1.0 | "Z" | 90 | 
If your experience study requires a numeric feature that varies over time (ex: policy values, crediting rates, etc.), you can always attach it to an ExposedDF object’s data using a join function.
ExposedDF() does not support studies with multiple changes between an active status and an inactive status.