In SAS I frequently make use of the very handy SUMMARY procedure. It will produce a wide range of aggregate results at whatever class level you specify – or multiple levels if you like. So given a dataset costs with measure variable total_cost and class variables month, age, tenure and location, I might wish to see one-way analyses of cost by all of these classes, and possibly two-way analysis of month vs age and month vs tenure. So I could write a whole lot of summary procedures like the below:
/*Month 1-way analysis*/ proc summary data=costs nway missing ; class month ; var total_cost ; output out=cost_month_summary (drop=_type_ rename=(_freq_=count)) min= max= p5= q1= median= q3= p95= mean= sum= /autoname ; run; /*Month by Age 2-way analysis*/ proc summary data=costs nway missing ; class month age; var total_cost ; output out=cost_month_age_summary (drop=_type_ rename=(_freq_=count)) min= max= p5= q1= median= q3= p95= mean= sum= /autoname ; run; /*etc, etc.*/
Now while that will work, it’s a lot of code to write and maintain. I prefer to request all the required results at once, in one proc:
proc summary data=costs missing ; class month age tenure location; var total_cost; types /*One-way analyses*/ month age tenure location /*two-way*/ month*age month*tenure ; output out=cost_summary (rename=(_freq_=count)) min= max= p5= q1= median= q3= p95= mean= sum= /autoname ; run;
The change that makes this work is of course the type statement, and removing the nway option (we no longer want an n-way analysis).
The output dataset will then have a column for every class variable, one for each requested statistic (including the renamed _freq_), and crucially we’ve also kept the autogenerated _type_ variable. _Type_ is the variable which tells you which summary request you’re looking at, assigning numbers to each combination of variables according to a simple binary pattern. Basically given n class variables arranged in logical order A1 … An in the output dataset, working from right to left each is assigned a value from 20 … 2n-1, so Am gets 2n-m. In our example, if month, age, tenure and location have logical positions 1, 2, 3 and 4 respectively, then month=23, age=22, tenure=21 and location=20. That means the requested analysis types get the following _type_ values:
month = 23 = 8
age = 22 = 4
tenure = 21 = 2
location = 20 = 1
month*tenure = 23 + 21 = 10
month*age = 23 + 22 = 12
This works fine for me, but the _type_ variable is a number, which is unhelpful to an audience without the technical knowledge, time or patience to decode it, so it would be great to be able to systematically assign it a descriptive label. Of course, given a combination of class variables in a summary dataset, it’s possible to work out the value of _type_ that has been assigned to them. Turns out, this paper from SUGI 31 by Dan Bruns contains the macro %findtype written by Ron Coleman which will examine a summary dataset table and return the value of _type_ for the variable list vars:
%macro findtype(table,vars); %local index fid flag target; %let fid = %sysfunc(open(&table,i)); %if not &fid %then %goto exit; %let vars = %upcase(&vars); %let flag = 0; %let _type_ = 0; %do i = %sysfunc(attrn(&fid,nvars)) %to 1 %by -1; %let target = %upcase(%sysfunc(varname(&fid,&i))); %if &flag %then %do; %let index = 1; %do %while (%length(%scan(&vars,&index)) > 0); %if &target = %scan(&vars,&index) %then %let _type_ = %eval(&_type_ + 2**(&flag - &i - 1)); %let index = %eval(&index + 1); %end; %end; %else %if &target = _TYPE_ %then %let flag = &i; %end; %let fid = %sysfunc(close(&fid)); &_type_ %goto term; %Exit:; -1 %term: %mend findtype;
What I wanted to do was use this macro to create a format typedesc to look up from _type_ to a description, so that in the example above put(12, typedesc.) would return ‘month*age’. I created a dataset types containing all the variable combinations requested in the summary, and then I used this to create a control data set typefmt:
filename out_tmp temp ; /*Write a data step to generate a format control set*/ data _null_; set types end=last; file out_tmp ; if _n_=1 then put 'data typefmt; retain fmtname type ; fmtname="typedesc"; type="N";' ; line='start=%findtype(cost_summary, '|| types ||'); label="'|| types ||'"; output ;' ; put line ; if last then put 'run ;' ; run; %include out_tmp /source2 ;
Here the variable types acts as the label, and %findtype finds the _type_ value to apply the label to.
From there it’s a simple matter of compiling the format and applying it to _type_ in the summary dataset:
proc format library=work cntlin=typefmt ; run ; data cost_summary ; set cost_summary ; type=put(_type_, typefmt.) ; run ;