b¬äd†ødd

Dedicated to design and performance of databases and audio systems.

Historical Valuation Compression

The most frequent code request I receive is, Do you have SQL that will remove redundant valuation records, and re-chain the remaining records accordingly?

First, let's define what a historical valuation is in this context:
Entities with life-cycle capabilities are captured and retained in a historical valuation context. Each change to the entity is captured on a data record with an inclusive Row Effective and an exclusive Row Expiration timepoint. These records start at the inception of the entity and track throughout time to infinity. The timeframes inter-record are discrete and contiguous with no gaps in time, nor overlaps--they all bookend.

For example, an insurance claim may have its inception on 1/1/2012 and be in an open status; close on 1/5/2012; and re-open on 1/10/2012. Therefore, the entity's records may be physically stored as

ID  Row Effective Date  Row Expiration Date  Status Code
1   2012-01-01          2012-01-05           O
1   2012-01-05          2012-01-10           C
1   2012-01-10          9999-12-31           O

Note, the ending bound of one record is equivalent to the beginning bound of the next chronologically sequential record. Again, the beginning bound is inclusive and the ending bound is exclusive; therefore, the claim's status on 1/5/2012 is closed.

With this example, there is no redundancy; however, if we add another field, the claim's status may be redundant across serial records. We modify our example as

ID  Row Effective Date  Row Expiration Date  Status Code  Class Code
1   2012-01-01          2012-01-03           O            A
1   2012-01-03          2012-01-05           O            B
1   2012-01-05          2012-01-10           C            B
1   2012-01-10          9999-12-31           O            B

Our case need here is to provide solely the Claim ID and Status Code fields with their valuation periods. Since the first two records in our latter example would be redundant, we need a process that transforms this example into a perspective matching the first example (i.e. remove the second record and modify the first record's Row Expiration Date to be equal to the third record's Row Effective Date). This could be for semantic purposes (e.g. remove potential confusion) or due to a data volume need (e.g. keep it small and efficient).

Via SQL we previously would have had to self-join the table three times. That's three passes though the table! A lot of resource consumption from a processing and temporary storage perspective.

Old school method:

lock table clm_t for access
select
    now_pre.clm_id                               as "Claim ID",
    now_pre.eff_dte                              as "Effective Date",
    coalesce(min(pst.eff_dte),date '9999-12-31') as "Expiration Date",
    now_pre.sts_cde                              as "Status Code"

from (
    select
        now.clm_id,
      	now.eff_dte,
      	now.sts_cde

    from clm_t now -- the reference record: ground zero

      -- the previous record relative to the reference record
      left outer join clm_t pre on
      	pre.clm_id  = now.clm_id and
      	pre.exp_dte = now.eff_dte
 
    where
        -- identifies the first record in chronological sequence
      	pre.clm_id is null
        or
        -- only records where the code value has changed
      	now.sts_cde <> pre.sts_cde) now_pre 

  -- all records post the reference record with a different code value
  left outer join clm_t pst on
    pst.clm_id  =  now_pre.clm_id  and
    pst.eff_dte >  now_pre.eff_dte and
    pst.sts_cde <> now_pre.sts_cde
;

New school: via the use of Ordered Analytical Functions resulting in only one pass through the Claim table!

lock table clm_t for access
select
    clm_id  as "Claim ID",
    eff_dte as "Effective Date",

    -- the next record's Effective Date is this record's Expiration Date
    coalesce(
        min(eff_dte) over(partition by clm_id
                          order     by eff_dte
                          rows between 1 following
                                   and 1 following),
        date '9999-12-31') as "Expiration Date",

 	sts_cde as "Status Code"
 
from (
    select
        clm_id,
      	eff_dte,
      	sts_cde

    from clm_t

    qualify
        -- identifies the first record in chronological sequence
        row_number() over(partition by clm_id
                          order     by eff_dte) = 1      
      	or
        -- only records where the code value has changed
      	sts_cde <> min(sts_cde) over(partition by clm_id 
                                     order     by eff_dte
                                     rows between 1 preceding
                                              and 1 preceding)) foo
;

The performance and efficiency of the single-table pass via Ordered Analytical Functions is vastly superior to the old-school method performing a three-table self-join.

Cheers
-Brad