Dedicated to design and performance of databases and audio systems.
The most frequent code request I receive is, Do you have SQL that will remove redundant valuation records, and re-chain the remaining records accordingly?
First, let's define what a historical valuation is in this context:
Entities with life-cycle capabilities are captured and retained in a historical valuation context. Each change to the entity is captured on a data record with an inclusive Row Effective and an exclusive Row Expiration timepoint. These records start at the inception of the entity and track throughout time to infinity. The timeframes inter-record are discrete and contiguous with no gaps in time, nor overlaps--they all bookend.
For example, an insurance claim may have its inception on 1/1/2012 and be in an open status; close on 1/5/2012; and re-open on 1/10/2012. Therefore, the entity's records may be physically stored as
ID Row Effective Date Row Expiration Date Status Code 1 2012-01-01 2012-01-05 O 1 2012-01-05 2012-01-10 C 1 2012-01-10 9999-12-31 O
Note, the ending bound of one record is equivalent to the beginning bound of the next chronologically sequential record. Again, the beginning bound is inclusive and the ending bound is exclusive; therefore, the claim's status on 1/5/2012 is closed.
With this example, there is no redundancy; however, if we add another field, the claim's status may be redundant across serial records. We modify our example as
ID Row Effective Date Row Expiration Date Status Code Class Code 1 2012-01-01 2012-01-03 O A 1 2012-01-03 2012-01-05 O B 1 2012-01-05 2012-01-10 C B 1 2012-01-10 9999-12-31 O B
Our case need here is to provide solely the Claim ID and Status Code fields with their valuation periods. Since the first two records in our latter example would be redundant, we need a process that transforms this example into a perspective matching the first example (i.e. remove the second record and modify the first record's Row Expiration Date to be equal to the third record's Row Effective Date). This could be for semantic purposes (e.g. remove potential confusion) or due to a data volume need (e.g. keep it small and efficient).
Via SQL we previously would have had to self-join the table three times. That's three passes though the table! A lot of resource consumption from a processing and temporary storage perspective.
Old school method:
lock table clm_t for access select now_pre.clm_id as "Claim ID", now_pre.eff_dte as "Effective Date", coalesce(min(pst.eff_dte),date '9999-12-31') as "Expiration Date", now_pre.sts_cde as "Status Code" from ( select now.clm_id, now.eff_dte, now.sts_cde from clm_t now -- the reference record: ground zero -- the previous record relative to the reference record left outer join clm_t pre on pre.clm_id = now.clm_id and pre.exp_dte = now.eff_dte where -- identifies the first record in chronological sequence pre.clm_id is null or -- only records where the code value has changed now.sts_cde <> pre.sts_cde) now_pre -- all records post the reference record with a different code value left outer join clm_t pst on pst.clm_id = now_pre.clm_id and pst.eff_dte > now_pre.eff_dte and pst.sts_cde <> now_pre.sts_cde ;
New school: via the use of Ordered Analytical Functions resulting in only one pass through the Claim table!
lock table clm_t for access select clm_id as "Claim ID", eff_dte as "Effective Date", -- the next record's Effective Date is this record's Expiration Date coalesce( min(eff_dte) over(partition by clm_id order by eff_dte rows between 1 following and 1 following), date '9999-12-31') as "Expiration Date", sts_cde as "Status Code" from ( select clm_id, eff_dte, sts_cde from clm_t qualify -- identifies the first record in chronological sequence row_number() over(partition by clm_id order by eff_dte) = 1 or -- only records where the code value has changed sts_cde <> min(sts_cde) over(partition by clm_id order by eff_dte rows between 1 preceding and 1 preceding)) foo ;
The performance and efficiency of the single-table pass via Ordered Analytical Functions is vastly superior to the old-school method performing a three-table self-join.