Time Series Analysis in SAS

I’ve worked with SAS for a few years now, but I still continue to be surprised at the sheer number of analytical tools in its ‘shed’. Most of the time I get by with the data step, proc sql and a few built-in procedures (freq, sort, summary, tabulate), but occasionally I have a poke around to see what else might be useful.

Today I was working on a data set which contains court monthly throughput numbers – here’s the table definition:

proc sql;
  create table Throughput_&monthend as
  select
    Circuit
  , Court
  , Jurisdiction
  , ProceedingType
  , MonthEnd
  , NewBusiness
  , Acquittal
  , JudgementGiven
  , BeforeFixture
  , OnDay
  , TotalDisposals
  , EstHearingDays
  , OpeningBalance
  , ClosingBalance
  from
    Throughput_main
run;

MonthEnd is the time series index variable, and Circuit, Court, Jurisdiction and ProceedingType are dimensions which create cross-sectional time series of all the other variables.

What I needed to generate was a rolling 12-month sum for each individual time series at two different levels: Jurisdiction and ProceedingType. Now in theory this is possible with one or two data steps, but to be honest the thought of attempting that with all the RETAINs and BY variables involved makes me cringe. Luckily for me, the SAS/ETS (Econometrics and Time Series) module has more built-in procedures for working with time series than you can shake a stick at. Here’s how I approached this problem.

First, I created a time series at the Jurisdiction level:

proc sort data=Throughput_&monthend
	out=Throughput_Juris_std;
	by circuit court jurisdiction monthend;
run;

proc timeseries data=Throughput_Juris_std
				out=TS_Juris_tmp (label="Throughput Time Series at Jurisdictional level");
	ID MonthEnd 	Interval=MONTH;
	VAR NewBusiness /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR Acquittal /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR JudgementGiven /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR BeforeFixture /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR OnDay /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR TotalDisposals /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR EstHearingDays /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR OpeningBalance /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	VAR ClosingBalance /	ACCUMULATE=TOTAL SETMISSING=MISSING;
	BY circuit Court Jurisdiction;
run;

The Timeseries procedure creates the required Jurisdiction-level totals for each month. It’s really aimed at aggregating transactional data within intervals, but it will also work to aggregate within BY groups. Easy way to code this in SAS Enterprise Guide: select the data set, go to the pull-down menus and select Analyze -> Time Series -> Create Time Series Data. This will bring up a wizard which asks you to specify the variable roles, and how the time series variables will be treated. Run this, then open the ‘Last Submitted Code’ project node to get pre-written code that can be modified to suit.

Next step is to create the 12-month sums:

proc expand data=TS_Juris_tmp
			out=sasdata.TS_Juris_12mth (label="Throughput Rolling 12 month Time Series")
			from=Month
			align=ending
			method=none;
	by circuit court jurisdiction;
	id monthend;
	convert OpeningBalance = OpeningBalance_12mth /
		transformin=(lag 11);
	label OpeningBalance_12mth="Opening Balance 12 months prior";
	convert NewBusiness = NewBusiness_12mth /
		transformin=(nomiss movsum 12);
	label NewBusiness_12mth="New Business 12 month rollup";
	convert Acquittal = Acquittal_12mth /
		transformin=(nomiss movsum 12);
	label Acquittal_12mth="Fixture Heard - Acquittal 12 month rollup";
	convert JudgementGiven = JudgementGiven_12mth /
		transformin=(nomiss movsum 12);
	label JudgementGiven_12mth="Fixture Heard - Judgment given 12 month rollup";
	convert BeforeFixture = BeforeFixture_12mth /
		transformin=(nomiss movsum 12);
	label BeforeFixture_12mth="Before Fixture 12 month rollup";
	convert OnDay = OnDay_12mth /
		transformin=(nomiss movsum 12);
	label OnDay_12mth="On day of Fixture 12 month rollup";
	convert TotalDisposals = TotalDisposals_12mth /
		transformin=(nomiss movsum 12);
	label TotalDisposals_12mth="Total Disposals 12 month rollup";
	convert EstHearingDays = EstHearingDays_12mth /
		transformin=(nomiss movsum 12);
	label EstHearingDays_12mth="Estimate of hearing days 12 month rollup";
run;
quit;

The expand procedure allows you to specify transformations for each time series variable using the convert / transformin syntax. In this case I’m selecting the opening balance 11 months ago (lag 11) to recover the opening balance at the start of the 12-month window, which I’m naming OpeningBalance_12mth. For all the other variables except ClosingBalance I add in a sum over a 12-month moving window (movsum 12).

As with proc timeseries, EG has a pull-down menu item which is nice for generating the code (or for including in the project) – Analyze -> Time Series -> Prepare Time Series Data.

And that’s pretty much it – easy. Not only is this much quicker to generate than the equivalent data step code, by virtue of being far less complicated it’s also much easier to maintain. If I ever need to change it, I’m fairly confident I can do it without too much drama.

QueryCell – SQL in Excel

QueryCell is an Excel add-in from Oak Focus Software that gives Excel users the ability to use SQL in Excel. Version 1.4 was released this month and lead developer Sam Howley gave me the chance to take a test drive, so I downloaded it and tried it out.

Before I get into what I thought of it, I know that Dick Kusleika at Daily Dose of Excel reviewed QueryCell last year. A number of questions were asked in comments then around what QueryCell provides that isn’t already available in Excel. It’s a fair question – after all, SQL in Excel is not a new concept – either using VBA and ADO, or MS Query. Those are all great tools for querying data from and in Excel, after you get to know how to apply them. After spending some time using QueryCell, I think the standout difference can be summed up with two pairs of hyphenated words: single-click and user-friendly. From the worksheet to the SQL editor: one click. No need to select the data source, specify the driver or write a connection string. One click and QueryCell opens up, auto-detects all data regions in the active workbook(s) and has the SQL editor ready to go.

Also, QueryCell provides two features that really sold me: Table DDL and the SQL Insert Generator. The Table DDL window displays a CREATE TABLE statement for the selected table:

Checking the ‘Use Custom Table DDL’ checkbox allows you to edit the statement, which is handy: if a column in the table has a reserved word (like Date or Case) as a title, QueryCell will auto-assign a generated name to replace it in the DDL. This is necessary in order to allow the Firebird database that QueryCell uses at the backend to accept the data. Also, the data type assigned to the columns is automatically set to VARCHAR(512), unless the columns have suffixes which identify them as another type, so these may need to be changed.

The SQL Insert Generator window is available on right-clicking inside a table on the worksheet, and generates an INSERT statement for each row in the table:

The ‘View Generator SQL’ button allows you to see the select statement that’s used to generate the INSERTs, which can be modified and re-run if required. The statements can then be copied to the clipboard, or saved to a script file.

What is it that makes these features so useful? Both the Table DDL and the Insert Generator appeal to me as tools to assist with migrating data from Excel to a database, which is a battle I’ve had to fight on more than one occasion.

Some other cool features: both the SQL Editor in the main window and the Insert Generator feature a ‘Format SQL’ button, which is nice. Column names are auto-completed in the SQL editor – also very nice. There’s a test data generator, which is cool for getting development going without needing to have data in the right shape beforehand, and a well-developed SQL/QueryCell tutorial.

Overall impressions: I’m really taken with QueryCell. It delivers some great functionality which I see as hugely useful in the context of what I do with Excel day-to-day. The UI has a very polished and responsive feel, and makes good use of space. It’s also very intuitive and easy to use, which as I said earlier is one of the key differences that sell it to me over existing alternatives. It’s modestly priced at $45 US, and offers a fully-functional trial download which is good for 30 queries before expiry. Well worth giving a whirl, which I thoroughly recommend.

Special Offer: Sam has generously provided me with 4 free licences to give away. The first four to email me (address at About Me) with “QueryCell” in the subject line will get a licence key sent back to them – please note I won’t be able to reply straightaway, but you can expect a reply within 24 hours.

Edit: Thanks for all the emails, people – the four licences have now all gone to good homes. I’d encourage you to download the trial anyway and see what you think…