Convert a data frame of census-level records to exposure-level records.
Usage
expose(
.data,
end_date,
start_date = as.Date("1900-01-01"),
target_status = NULL,
cal_expo = FALSE,
expo_length = c("year", "quarter", "month", "week"),
col_pol_num = "pol_num",
col_status = "status",
col_issue_date = "issue_date",
col_term_date = "term_date",
default_status
)
expose_py(...)
expose_pq(...)
expose_pm(...)
expose_pw(...)
expose_cy(...)
expose_cq(...)
expose_cm(...)
expose_cw(...)
Arguments
- .data
A data frame with census-level records
- end_date
Experience study end date
- start_date
Experience study start date. Default value = 1900-01-01.
- target_status
Character vector of target status values. Default value =
NULL
.- cal_expo
Set to TRUE for calendar year exposures. Otherwise policy year exposures are assumed.
- expo_length
Exposure period length
- col_pol_num
Name of the column in
.data
containing the policy number- col_status
Name of the column in
.data
containing the policy status- col_issue_date
Name of the column in
.data
containing the issue date- col_term_date
Name of the column in
.data
containing the termination date- default_status
Optional scalar character representing the default active status code. If not provided, the most common status is assumed.
- ...
Arguments passed to
expose()
Value
A tibble with class exposed_df
, tbl_df
, tbl
,
and data.frame
. The results include all existing columns in
.data
plus new columns for exposures and observation periods. Observation
periods include counters for policy exposures, start dates, and end dates.
Both start dates and end dates are inclusive bounds.
For policy year exposures, two observation period columns are returned.
Columns beginning with (pol_
) are integer policy periods. Columns
beginning with (pol_date_
) are calendar dates representing
anniversary dates, monthiversary dates, etc.
Details
Census-level data refers to a data set wherein there is one row per unique policy. Exposure-level data expands census-level data such that there is one record per policy per observation period. Observation periods could be any meaningful period of time such as a policy year, policy month, calendar year, calendar quarter, calendar month, etc.
target_status
is used in the calculation of exposures. The annual
exposure method is applied, which allocates a full period of exposure for
any statuses in target_status
. For all other statuses, new entrants
and exits are partially exposed based on the time elapsed in the observation
period. This method is consistent with the Balducci Hypothesis, which assumes
that the probability of termination is proportionate to the time elapsed
in the observation period. If the annual exposure method isn't desired,
target_status
can be ignored. In this case, partial exposures are
always applied regardless of status.
default_status
is used to indicate the default active status that
should be used when exposure records are created.
Policy period and calendar period variations
The functions expose_py()
, expose_pq()
, expose_pm()
,
expose_pw()
, expose_cy()
, expose_cq()
,
expose_cm()
, expose_cw()
are convenience functions for
specific implementations of expose()
. The two characters after the
underscore describe the exposure type and exposure period, respectively.
For exposures types:
p
refers to policy yearsc
refers to calendar years
For exposure periods:
y
= yearsq
= quartersm
= monthsw
= weeks
All columns containing dates must be in YYYY-MM-DD format.
References
Atkinson and McGarry (2016). Experience Study Calculations. https://www.soa.org/49378a/globalassets/assets/files/research/experience-study-calculations.pdf
See also
expose_split()
for information on splitting calendar year
exposures by policy year.
Examples
toy_census |> expose("2020-12-31")
#>
#> ── Exposure data ──
#>
#> • Exposure type: policy_year
#> • Target status:
#> • Study range: 1900-01-01 to 2020-12-31
#>
#> # A tibble: 33 × 8
#> pol_num status issue_date term_date pol_yr pol_date_yr pol_date_yr_end
#> <int> <fct> <date> <date> <int> <date> <date>
#> 1 1 Active 2010-01-01 NA 1 2010-01-01 2010-12-31
#> 2 1 Active 2010-01-01 NA 2 2011-01-01 2011-12-31
#> 3 1 Active 2010-01-01 NA 3 2012-01-01 2012-12-31
#> 4 1 Active 2010-01-01 NA 4 2013-01-01 2013-12-31
#> 5 1 Active 2010-01-01 NA 5 2014-01-01 2014-12-31
#> 6 1 Active 2010-01-01 NA 6 2015-01-01 2015-12-31
#> 7 1 Active 2010-01-01 NA 7 2016-01-01 2016-12-31
#> 8 1 Active 2010-01-01 NA 8 2017-01-01 2017-12-31
#> 9 1 Active 2010-01-01 NA 9 2018-01-01 2018-12-31
#> 10 1 Active 2010-01-01 NA 10 2019-01-01 2019-12-31
#> # ℹ 23 more rows
#> # ℹ 1 more variable: exposure <dbl>
census_dat |> expose_py("2019-12-31", target_status = "Surrender")
#>
#> ── Exposure data ──
#>
#> • Exposure type: policy_year
#> • Target status: Surrender
#> • Study range: 1900-01-01 to 2019-12-31
#>
#> # A tibble: 141,252 × 15
#> pol_num status issue_date inc_guar qual age product gender wd_age premium
#> <int> <fct> <date> <lgl> <lgl> <int> <fct> <fct> <int> <dbl>
#> 1 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 2 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 3 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 4 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 5 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 6 1 Active 2014-12-17 TRUE FALSE 56 b F 77 370
#> 7 2 Active 2007-09-24 FALSE FALSE 71 a F 71 708
#> 8 2 Active 2007-09-24 FALSE FALSE 71 a F 71 708
#> 9 2 Active 2007-09-24 FALSE FALSE 71 a F 71 708
#> 10 2 Active 2007-09-24 FALSE FALSE 71 a F 71 708
#> # ℹ 141,242 more rows
#> # ℹ 5 more variables: term_date <date>, pol_yr <int>, pol_date_yr <date>,
#> # pol_date_yr_end <date>, exposure <dbl>