Tricks with _TYPE_

In SAS I frequently make use of the very handy SUMMARY procedure. It will produce a wide range of aggregate results at whatever class level you specify – or multiple levels if you like. So given a dataset costs with measure variable total_cost and class variables month, age, tenure and location, I might wish to see one-way analyses of cost by all of these classes, and possibly two-way analysis of month vs age and month vs tenure. So I could write a whole lot of summary procedures like the below:

/*Month 1-way analysis*/
proc summary data=costs nway missing ;
	class month ;
	var   total_cost ;
	output out=cost_month_summary (drop=_type_ rename=(_freq_=count))
		min= max= p5= q1= median= q3= p95= mean= sum= /autoname ;
run;

/*Month by Age 2-way analysis*/
proc summary data=costs nway missing ;
	class month age;
	var   total_cost ;
	output out=cost_month_age_summary (drop=_type_ rename=(_freq_=count))
		min= max= p5= q1= median= q3= p95= mean= sum= /autoname ;
run;
/*etc, etc.*/

Now while that will work, it’s a lot of code to write and maintain. I prefer to request all the required results at once, in one proc:

proc summary data=costs missing ;
	class month age tenure location;
	var   total_cost;
	types /*One-way analyses*/
		  month age tenure location
		  /*two-way*/
		  month*age
		  month*tenure
	;
	output out=cost_summary (rename=(_freq_=count))
		min= max= p5= q1= median= q3= p95= mean= sum= /autoname ;
run;

The change that makes this work is of course the type statement, and removing the nway option (we no longer want an n-way analysis).

The output dataset will then have a column for every class variable, one for each requested statistic (including the renamed _freq_), and crucially we’ve also kept the autogenerated _type_ variable. _Type_ is the variable which tells you which summary request you’re looking at, assigning numbers to each combination of variables according to a simple binary pattern. Basically given n class variables arranged in logical order A1 … An in the output dataset, working from right to left each is assigned a value from 20 … 2n-1, so Am gets 2n-m. In our example, if month, age, tenure and location have logical positions 1, 2, 3 and 4 respectively, then month=23, age=22, tenure=21 and location=20. That means the requested analysis types get the following _type_ values:

month = 23 = 8
age = 22 = 4
tenure = 21 = 2
location = 20 = 1
month*tenure = 23 + 21 = 10
month*age = 23 + 22 = 12

This works fine for me, but the _type_ variable is a number, which is unhelpful to an audience without the technical knowledge, time or patience to decode it, so it would be great to be able to systematically assign it a descriptive label. Of course, given a combination of class variables in a summary dataset, it’s possible to work out the value of _type_ that has been assigned to them. Turns out, this paper from SUGI 31 by Dan Bruns contains the macro %findtype written by Ron Coleman which will examine a summary dataset table and return the value of _type_ for the variable list vars:

%macro findtype(table,vars);
   %local index fid flag target;
   %let fid = %sysfunc(open(&table,i));
   %if not &fid %then %goto exit;
   %let vars = %upcase(&vars);
   %let flag = 0;
   %let _type_ = 0;
   %do i = %sysfunc(attrn(&fid,nvars)) %to 1 %by -1;
      %let target = %upcase(%sysfunc(varname(&fid,&i)));
      %if &flag %then %do;
         %let index = 1;
         %do %while (%length(%scan(&vars,&index)) > 0);
            %if &target = %scan(&vars,&index) %then
               %let _type_ = %eval(&_type_ + 2**(&flag - &i - 1));
            %let index = %eval(&index + 1);
         %end;
      %end;
      %else %if &target = _TYPE_ %then
         %let flag = &i;
   %end;
   %let fid = %sysfunc(close(&fid));
   &_type_
   %goto term;
%Exit:;
   -1
%term:
%mend findtype;

What I wanted to do was use this macro to create a format typedesc to look up from _type_ to a description, so that in the example above put(12, typedesc.) would return ‘month*age’. I created a dataset types containing all the variable combinations requested in the summary, and then I used this to create a control data set typefmt:

filename out_tmp temp ;

/*Write a data step to generate a format control set*/
data _null_;
  set types end=last;
  file out_tmp ;
  if _n_=1 then put 'data typefmt; retain fmtname type ; fmtname="typedesc"; type="N";' ;
  line='start=%findtype(cost_summary, '|| types ||'); label="'|| types ||'"; output ;' ;
  put line ;
  if last then put 'run ;' ;
run;

%include out_tmp /source2 ;

Here the variable types acts as the label, and %findtype finds the _type_ value to apply the label to.

From there it’s a simple matter of compiling the format and applying it to _type_ in the summary dataset:

proc format library=work cntlin=typefmt ;
run ;

data cost_summary ;
  set cost_summary ;
  type=put(_type_, typefmt.) ;
run ;
Advertisements