Lags and Unintended Consequences

Analytic functions like LAG, LEAD, FIRST_VALUE and LAST_VALUE are a very useful addition to Oracle SQL, enabling retrieval of aggregate results without the need for self-joins. LAG, for instance, will allow you to get the value of a column from the previous row in a group of rows.

Here’s an example of where I used this recently. I was attempting to monitor transfers of cases between offices by reading from a table CASE_MGMT_SEQ containing a record of which offices managed which cases, which held a row for each office in the ‘management sequence’ of the case history. Here’s roughly what that looked like:

The column MGMT_SEQ tells us the position of each row in the case management sequence. Each row in the case sequence ends on the same date as the next row starts, and it is possible (although not shown in this example) for the next row to be in the same office as in the previous row. From this view we can tell that case A was managed in Wellington from Feb 1 to Feb 20 this year, then was moved to Auckland from Feb 20 to Feb 25, and finally moved back to Wellington, where it remained until Mar 10.

The SQL to track transfers needed to tell me (among other things; I’ve simplified this considerably) which case we’re transferring, when the transfer happened, where it was transferred to, and where it was transferred from:

WITH TRANSFERS AS
  (SELECT CASE_ID ,
    LAG(OFFICE) OVER (PARTITION BY CASE_ID ORDER BY CASE_ID, MGMT_SEQ) AS PREV_OFFICE,
    OFFICE ,
    START_DATE,
    END_DATE
  FROM CASE_MGMT_SEQ
  WHERE START_DATE BETWEEN TO_DATE(:STDATE, 'yyyy/mm/dd') AND TO_DATE(:ENDDATE, 'yyyy/mm/dd')
  )
SELECT 
  CASE_ID,
  PREV_OFFICE AS TRANSFER_FROM,
  OFFICE AS TRANSFER_TO,
  START_DATE AS TRANSFER_DATE
FROM TRANSFERS
WHERE (OFFICE  = :OFFICE
OR PREV_OFFICE = :OFFICE)
AND OFFICE    <> PREV_OFFICE;

The report returns for a given office any transfers in or out of a specified office over a period bounded by STDATE and ENDDATE.

But something weird is happening. I try running this report with the OFFICE parameter set to Wellington, STDATE set to 2012/02/21 and ENDDATE set to 2012/02/27, and here’s what I get:

There’s a transfer of case C from Wellington to Hamilton on Feb 24, certainly. The row where case C is managed in Wellington from Feb 23 to Feb 24 shouldn’t be counted as a transfer, as there is no previous office – the lag will return a null in this case, so the line OFFICE PREV_OFFICE in the WHERE clause will return null and hence the row will be filtered out. But there’s also case A’s transfer into Wellington from Auckland on Feb 25. There is definitely a previous office in that case. What’s going on?

The problem here is that I wasn’t paying attention to the order in which the clauses execute. It’s natural to assume that because the SELECT clause comes first, it gets executed first. In fact, the query SELECT-FROM-WHERE is executed FROM-WHERE-SELECT. This means that inside the temp table TRANSFERS, the where clause filters out rows with a start date outside the date bounds before the lag function gets to calculate the previous office. Here’s what I get when I just run the TRANSFERS sub-query with the same date parameters:

The row where case A was managed in Auckland from Feb 20 to Feb 25 is filtered out by WHERE as the start date is not within the specified bounds. Hence the window that LAG uses in SELECT to calculate the previous office has no row to look back to for the next row in the case A sequence, and this causes a null to be generated. When the WHERE clause in the outer query compares OFFICE and PREV_OFFICE, a null is returned and so this row doesn’t make it through.

So what can we do about this? Well, some people will say wrap the null values in NVL. That’s fine if you want a default value to come through, but in this case I would actually like to see the previous office – the logical fault lies in the sub-query rather than the outer query. The problem arises because the previous row is not being passed through to SELECT, so we need to get it included somehow.

In the end this was quite simple – we’re only filtering based on START_DATE, but if we include the same filter on END_DATE then the previous row will come through, by virtue of the fact that each row in the sequence ends on the same day the subsequent row starts. Here’s how the report is modified with a change to the WHERE clause in the sub-query:

WHERE START_DATE BETWEEN TO_DATE(:STDATE, 'yyyy/mm/dd') AND TO_DATE(:ENDDATE, 'yyyy/mm/dd')
  OR END_DATE BETWEEN TO_DATE(:STDATE, 'yyyy/mm/dd') AND TO_DATE(:ENDDATE, 'yyyy/mm/dd')

And the transfer from Auckland to Wellington shows up:

Reference for LAG and other analytic functions: http://psoug.org/reference/analytic_functions.html