Dedicated to design and performance of databases and audio systems.
The most frequent code request I receive is, Do you have SQL that will remove redundant valuation records, and re-chain the remaining records accordingly?
First, let's define what a historical valuation is in this context:
Entities with life-cycle capabilities are captured and retained in a historical valuation context. Each change to the entity is captured on a data record with an inclusive Row Effective and an exclusive Row Expiration timepoint. These records start at the inception of the entity and track throughout time to infinity. The timeframes inter-record are discrete and contiguous with no gaps in time, nor overlaps--they all bookend.
For example, an insurance claim may have its inception on 1/1/2012 and be in an open status; close on 1/5/2012; and re-open on 1/10/2012. Therefore, the entity's records may be physically stored as
ID Row Effective Date Row Expiration Date Status Code
1 2012-01-01 2012-01-05 O
1 2012-01-05 2012-01-10 C
1 2012-01-10 9999-12-31 O
Note, the ending bound of one record is equivalent to the beginning bound of the next chronologically sequential record. Again, the beginning bound is inclusive and the ending bound is exclusive; therefore, the claim's status on 1/5/2012 is closed.
With this example, there is no redundancy; however, if we add another field, the claim's status may be redundant across serial records. We modify our example as
ID Row Effective Date Row Expiration Date Status Code Class Code
1 2012-01-01 2012-01-03 O A
1 2012-01-03 2012-01-05 O B
1 2012-01-05 2012-01-10 C B
1 2012-01-10 9999-12-31 O B
Our case need here is to provide solely the Claim ID and Status Code fields with their valuation periods. Since the first two records in our latter example would be redundant, we need a process that transforms this example into a perspective matching the first example (i.e. remove the second record and modify the first record's Row Expiration Date to be equal to the third record's Row Effective Date). This could be for semantic purposes (e.g. remove potential confusion) or due to a data volume need (e.g. keep it small and efficient).
Via SQL we previously would have had to self-join the table three times. That's three passes though the table! A lot of resource consumption from a processing and temporary storage perspective.
Old school method:
lock table clm_t for access
select
now_pre.clm_id as "Claim ID",
now_pre.eff_dte as "Effective Date",
coalesce(min(pst.eff_dte),date '9999-12-31') as "Expiration Date",
now_pre.sts_cde as "Status Code"
from (
select
now.clm_id,
now.eff_dte,
now.sts_cde
from clm_t now -- the reference record: ground zero
-- the previous record relative to the reference record
left outer join clm_t pre on
pre.clm_id = now.clm_id and
pre.exp_dte = now.eff_dte
where
-- identifies the first record in chronological sequence
pre.clm_id is null
or
-- only records where the code value has changed
now.sts_cde <> pre.sts_cde) now_pre
-- all records post the reference record with a different code value
left outer join clm_t pst on
pst.clm_id = now_pre.clm_id and
pst.eff_dte > now_pre.eff_dte and
pst.sts_cde <> now_pre.sts_cde
;
New school: via the use of Ordered Analytical Functions resulting in only one pass through the Claim table!
lock table clm_t for access
select
clm_id as "Claim ID",
eff_dte as "Effective Date",
-- the next record's Effective Date is this record's Expiration Date
coalesce(
min(eff_dte) over(partition by clm_id
order by eff_dte
rows between 1 following
and 1 following),
date '9999-12-31') as "Expiration Date",
sts_cde as "Status Code"
from (
select
clm_id,
eff_dte,
sts_cde
from clm_t
qualify
-- identifies the first record in chronological sequence
row_number() over(partition by clm_id
order by eff_dte) = 1
or
-- only records where the code value has changed
sts_cde <> min(sts_cde) over(partition by clm_id
order by eff_dte
rows between 1 preceding
and 1 preceding)) foo
;
The performance and efficiency of the single-table pass via Ordered Analytical Functions is vastly superior to the old-school method performing a three-table self-join.
Cheers
-Brad