I’ve worked with SAS for a few years now, but I still continue to be surprised at the sheer number of analytical tools in its ‘shed’. Most of the time I get by with the data step, proc sql and a few built-in procedures (freq, sort, summary, tabulate), but occasionally I have a poke around to see what else might be useful.
Today I was working on a data set which contains court monthly throughput numbers – here’s the table definition:
proc sql; create table Throughput_&monthend as select Circuit , Court , Jurisdiction , ProceedingType , MonthEnd , NewBusiness , Acquittal , JudgementGiven , BeforeFixture , OnDay , TotalDisposals , EstHearingDays , OpeningBalance , ClosingBalance from Throughput_main run;
MonthEnd is the time series index variable, and Circuit, Court, Jurisdiction and ProceedingType are dimensions which create cross-sectional time series of all the other variables.
What I needed to generate was a rolling 12-month sum for each individual time series at two different levels: Jurisdiction and ProceedingType. Now in theory this is possible with one or two data steps, but to be honest the thought of attempting that with all the RETAINs and BY variables involved makes me cringe. Luckily for me, the SAS/ETS (Econometrics and Time Series) module has more built-in procedures for working with time series than you can shake a stick at. Here’s how I approached this problem.
First, I created a time series at the Jurisdiction level:
proc sort data=Throughput_&monthend out=Throughput_Juris_std; by circuit court jurisdiction monthend; run; proc timeseries data=Throughput_Juris_std out=TS_Juris_tmp (label="Throughput Time Series at Jurisdictional level"); ID MonthEnd Interval=MONTH; VAR NewBusiness / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR Acquittal / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR JudgementGiven / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR BeforeFixture / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR OnDay / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR TotalDisposals / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR EstHearingDays / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR OpeningBalance / ACCUMULATE=TOTAL SETMISSING=MISSING; VAR ClosingBalance / ACCUMULATE=TOTAL SETMISSING=MISSING; BY circuit Court Jurisdiction; run;
The Timeseries procedure creates the required Jurisdiction-level totals for each month. It’s really aimed at aggregating transactional data within intervals, but it will also work to aggregate within BY groups. Easy way to code this in SAS Enterprise Guide: select the data set, go to the pull-down menus and select Analyze -> Time Series -> Create Time Series Data. This will bring up a wizard which asks you to specify the variable roles, and how the time series variables will be treated. Run this, then open the ‘Last Submitted Code’ project node to get pre-written code that can be modified to suit.
Next step is to create the 12-month sums:
proc expand data=TS_Juris_tmp out=sasdata.TS_Juris_12mth (label="Throughput Rolling 12 month Time Series") from=Month align=ending method=none; by circuit court jurisdiction; id monthend; convert OpeningBalance = OpeningBalance_12mth / transformin=(lag 11); label OpeningBalance_12mth="Opening Balance 12 months prior"; convert NewBusiness = NewBusiness_12mth / transformin=(nomiss movsum 12); label NewBusiness_12mth="New Business 12 month rollup"; convert Acquittal = Acquittal_12mth / transformin=(nomiss movsum 12); label Acquittal_12mth="Fixture Heard - Acquittal 12 month rollup"; convert JudgementGiven = JudgementGiven_12mth / transformin=(nomiss movsum 12); label JudgementGiven_12mth="Fixture Heard - Judgment given 12 month rollup"; convert BeforeFixture = BeforeFixture_12mth / transformin=(nomiss movsum 12); label BeforeFixture_12mth="Before Fixture 12 month rollup"; convert OnDay = OnDay_12mth / transformin=(nomiss movsum 12); label OnDay_12mth="On day of Fixture 12 month rollup"; convert TotalDisposals = TotalDisposals_12mth / transformin=(nomiss movsum 12); label TotalDisposals_12mth="Total Disposals 12 month rollup"; convert EstHearingDays = EstHearingDays_12mth / transformin=(nomiss movsum 12); label EstHearingDays_12mth="Estimate of hearing days 12 month rollup"; run; quit;
The expand procedure allows you to specify transformations for each time series variable using the convert / transformin syntax. In this case I’m selecting the opening balance 11 months ago (lag 11) to recover the opening balance at the start of the 12-month window, which I’m naming OpeningBalance_12mth. For all the other variables except ClosingBalance I add in a sum over a 12-month moving window (movsum 12).
As with proc timeseries, EG has a pull-down menu item which is nice for generating the code (or for including in the project) – Analyze -> Time Series -> Prepare Time Series Data.
And that’s pretty much it – easy. Not only is this much quicker to generate than the equivalent data step code, by virtue of being far less complicated it’s also much easier to maintain. If I ever need to change it, I’m fairly confident I can do it without too much drama.