Maintaining SAS formats in Excel

For one of the subject areas I deal with day to day, categorical data values are often stored in the database in narrow character fields, so we need to apply SAS formats to get readable values out in reports. To maintain these formats I store them in an Excel workbook, looking roughly like this:

This is a relatively painless way to manage keeping SAS formats up to date, as well as making them easily accessible to people who need to inspect them. Each format is stored in a named range, with the name of the range matching the name of the format. The column headers are chosen to agree with the SAS Format procedure requirement for an input control data set. The above range, for instance, would be named ‘excel_user’, and would be used to input the required data for a character format (Type=’C’) also named ‘excel_user’ (Fmtname).

So how do I get them into SAS from Excel? Once again, the SAS/ACCESS Excel libname statement is helpful here. First, we assign a libref ‘wb’ to the workbook holding the format ranges, and another libref ‘sasprogs’ which will be used to store the format catalog:

libname wb Excel "&path.\formats.xls" ver=2002 ;
libname sasprogs "&path" ;

Next we need to get a list of all the ranges that we’re interested in here. In this workbook the only named ranges are the ones containing the formats. Any other names that the libname engine detects will be the names of worksheets, which will contain a ‘$’ character. So, we simply exclude these from the query against the dictionary.tables table:

proc sql;
  create table tmp as
  select memname
  from dictionary.tables
  where libname='WB' 
    and index(memname, '$') = 0 ;
quit;

So now the table ‘tmp’ holds a list of all the format range names in the workbook. Now we need to use this table to guide SAS in building those formats. What we need to do is to submit a proc format statement for every one of those names. Like so:

%macro load_fmt(name) ;
  proc format library=sasprogs.sasfmts cntlin=wb.&name ;
%mend load_fmt;

Now we simply need to loop through the set of names to load each of the formats into the sasprogs.sasfmts catalog:

data _null_;
  set tmp;
  call execute('%load_fmt('||memname||')') ;
run;

All that’s left to do now is a little tidying up:

/*Close the file*/
libname wb clear ;
/*Inspect format contents*/
proc format library=sasprogs.sasfmts fmtlib cntlout=sasprogs.sasfmt_data; run;
/*Add them into the search path*/
options fmtsearch=(library sasfmts);

And we’re done – all the formats are now loaded and available in the SAS session. Now if there are any changes, or a new format needs to be created, just add it into the file, save it and re-run this script. Sorted!

Populating Excel from SAS

A while back, I posted about using the SAS Excel libname engine to read and write to and from Excel files. That turned out to be a reasonably popular topic, and engendered quite a few questions, so I thought I’d follow up with an example of the typical use I’ve made of this facility, along with a couple of lessons learned along the way.

I usually have a pre-formatted template file set up, which I copy to a new file destination and then assign a libname:

data _null_;
    infile "&tmpltdir.\template.xls" recfm=n;  /*Template file*/
    input one_byte $char1. @@;
    file "&rptdir.\report.xls" recfm=n;        /*Report file*/
    put one_byte $char1. @@;
run;

libname wb Excel "&rptdir.\report.xls" ver=2002;

Copying like this (input/put) is a little clunky I think, but it works. X commands are not enabled by default from Enterprise Guide, so this is the best option remaining.

The data will be stored in named ranges set up for that purpose – but before they can be populated, they need to be cleared:

proc datasets library=wb;
  delete Listing Header;
run;
quit;

Listing in this case would be a tabular dataset, and Header is a one row set which contains information like report date, refresh date, titles etc. – any information which is particular to that edition of the report.

The next step is to populate the data in the ranges:

data Header ;
	format startdate enddate refreshdate reportdate date9.;
	progpath="&regref.\sasprogs\&pgmname";
	startdate=&startdate.d;
	enddate=&enddate.d;
	refreshdate="&dwh_d_fmt"d;
	reportdate="&sysdate9"d;
run;

data wb.Header (dblabel=yes);
	set Header;
run;

The code to populate Listing would run much the same way. One point to note here is that sometimes the order of the columns output is important. In this case, it is possible to control the order of columns by either using a retain statement in the data step, or by using proc sql and ordering the columns in the select clause.

Once the ranges have been populated, close the file by clearing the libname:

libname wb clear;

This is important as otherwise the file will be held open by SAS.

And there you have it – the report has now been populated. Actually, there’s a further wrinkle I should mention. For some reason, the report file created often suffers from a peculiar “bloating” effect, where the file size appears to be roughly double what it should be. However, once the file is opened and saved in Excel, the size returns to normal. In a couple of cases involving large numbers of regular reports, I have written a VBA routine to open and save each one in turn…as with the copy step above, I find it a little clunky, but it can save a considerable amount of space.

Accessing Dynamic Named Ranges

I was planning to continue the series with a post on accessing Excel data from Python – this is a brief diversion related to a problem I’ve encountered recently with using SAS to read Excel files.

I have a bunch of Excel files which I need to read every month. The relevant data is contained in a dynamic named range – that is, a name where the RefersTo string is typically a formula of the form

=OFFSET(fixedCell,0,0,COUNTA(columnInRange),rangeWidth)

This is useful when the amount of data varies from file to file – in this case, some files will have 5 rows, and some will have 5,000. The use of the COUNTA() function means that the number of rows in the range is evaluated by Excel, hence the height of the range adjusts automatically. Which is great, when you’re reading those ranges in Excel, but which can cause problems when you’re trying to get at them without the benefit of the Excel calculation engine. Case in point – the read_file macro from a previous post which uses the SAS Excel libname engine, fails to read dynamic ranges. I’m not 100% sure that it’s impossible to do so (the wonderful folks at SAS tech support are looking into it for me), but it has always failed so far.

So, I need a workaround. I came up with the following – it’s a bit of a clunker, but it works. The basic idea: run an Excel macro over the files which copies the data from the dynamic range into a static range in a new workbook.

Private Sub ReadFile(ByVal sBookName As String)
' Reads sitting time files in INPATH, copies all time entries
' to a new workbook in OUTPATH
Dim wbkFile As Excel.Workbook
Dim wbkOut As Excel.Workbook
Dim rngData As Excel.Range
Dim rngOut As Excel.Range
Dim sBranchName As String
Dim sMonth As String


' Specify required values for output range
Set wbkFile = Application.Workbooks.Open(INPATH & sBookName)
Set rngData = wbkFile.Names("rngTimeData").RefersToRange
sBranchName = wbkFile.Names("BranchName").RefersToRange.Value
sMonth = Format$(Evaluate(wbkFile.Names("Month").RefersTo), _
        "mmyy")

' Add new workbook and specifies range to dump data into
Set wbkOut = Application.Workbooks.Add()
Set rngOut = wbkOut.Worksheets(1).Cells(1, 1)
Set rngOut = rngOut.Resize(rngData.Rows.Count, _
        rngData.Columns.Count)
' transfer values across, name the range, save and close the new
' workbook
With rngOut
    .Value = rngData.Value
End With
wbkOut.Names.Add "TimeData", rngOut


wbkFile.Close False
sBranchName = Replace$(sBranchName, " ", "_")
sBranchName = Left$(sBranchName, InStr(sBranchName, "_DC") - 1)
wbkOut.SaveAs OUTPATH & sBranchName & "_" & sMonth, _
        xlWorkbookNormal
wbkOut.Close


End Sub

Public Sub GetFiles()
' Read all files in INPATH, save copies into OUTPATH and check 
' which ones are missing
Dim fsObj As Scripting.FileSystemObject
Dim inFolder As Scripting.Folder
Dim inFile As Scripting.File


Application.ScreenUpdating = False


Set fsObj = CreateObject("Scripting.FileSystemObject")
Set inFolder = fsObj.GetFolder(INPATH)


For Each inFile In inFolder.Files
    ReadFile inFile.Name
Next inFile


OutputDirectoryContents


Application.ScreenUpdating = True


End Sub

OutputDirectoryContents is a macro which writes the names of all the files in the directory OUTPATH to the main worksheet, then runs an advanced filter over the table of files that should be in there, returning those that aren’t (the Criteria range is set to File Found =FALSE):

Private Sub OutputDirectoryContents()

' Lists contents of OUTPATH on the front sheet and returns any
' missing filenames
Dim fsObj As Scripting.FileSystemObject
Dim outFolder As Scripting.Folder
Dim outFile As Scripting.File
Dim wksht As Excel.Worksheet
Dim rngFiles As Excel.Range
Dim startCell As Excel.Range
Dim rngSource As Excel.Range
Dim rngCriteria As Excel.Range
Dim rngExtract As Excel.Range


' set required variables...
Set fsObj = CreateObject("Scripting.FileSystemObject")
Set outFolder = fsObj.GetFolder(OUTPATH)
Set wksht = ThisWorkbook.Worksheets(1)
Set rngFiles = ThisWorkbook.Names("Files_Found").RefersToRange
Set startCell = rngFiles(1)
Set rngSource = ThisWorkbook.Names("FileWishlist").RefersToRange
Set rngCriteria = ThisWorkbook.Names("rngCriteria").RefersToRange
Set rngExtract = ThisWorkbook.Names("rngExtract").RefersToRange


' Clear, then repopulate, range of files found in the specified folder
rngFiles.ClearContents


startCell.Value = Format$(DateAdd("m", -1, Date), "mmm-yyyy")
For Each outFile In outFolder.Files
    Set startCell = startCell.Offset(1)
    startCell.Value = outFile.Name
Next outFile


' Refresh filter to see which ones are missing...
RefreshAdvancedFilter rngCriteria, rngExtract, rngSource, , True


End Sub

Public Sub RefreshAdvancedFilter(ByRef rngCriteria As Excel.Range, _
               ByRef rngExtract As Excel.Range, _
               ByRef rngSource As Excel.Range, _
               Optional ByVal IsCopy As Boolean = True, _
               Optional ByVal IsUnique As Boolean = False)
' Refreshes advanced filter of the source range
Dim lCopy As Long


lCopy = IIf(IsCopy, xlFilterCopy, xlFilterInPlace)


rngSource.AdvancedFilter Action:=lCopy, criteriarange:=rngCriteria, _
    copytorange:=rngExtract, unique:=IsUnique


End Sub

Once I’m sure that all the files are there that need to be, I can run the SAS macro get_all_files over the contents of OUTPATH to create the datasets I want. Slightly convoluted (and a little wasteful in that the data is duplicated), but it gets the job done, so I’m happy.

Conversations with Excel, part 1

One of our most frequent requirements at work is transporting data between SAS and Excel. For the most part with our ad-hoc reporting the ‘conversation’ is one-way: the extraction and transformation is performed in SAS, then the data is output to a file destination in a format that Excel can read, via ods to csv, html, xml, or tab-delimited text files. The output file can then be opened in Excel and any after-work analysis or presentation performed there – formatting, worksheet formulae, pivot tables and charts can be added.

However, with regular reports it would be nice to automate a lot of this after-work by keeping a template with the formatting, formulae etc already complete, and having SAS simply update the underlying data. This can be done manually via a copy-paste operation from the output file or by using VBA to automate the transfer, and this is perfectly satisfactory – but this is not the only way. A lot of our legacy code uses the very useful but also slightly outdated DDE protocol – I won’t go into the detail of how DDE is used in SAS (usually through a filename statement) but there are plenty of good references available, for example here. DDE allows SAS to control the Excel application via the Excel 4 macro language, to open Excel files and to specify locations within those files to read from or write to. The only real problem we have with DDE is that, as we use SAS Enterprise Guide on the desktop as a thin client and SAS is on the server, all that legacy code is unable to find the Excel application to talk to.

The SAS v9 Excel libname engine (in the SAS/ACCESS package) to a large extent fills the gap left by DDE, without the need to control an instance of the Excel application. The libname statement creates a library out of the Excel workbook, where the tables in the library are the worksheets and named ranges in the workbook. This makes the code relatively simple – for example, I use the following macro to read the range ‘rngData’ from the file ‘fname’ into the SAS dataset ‘fshortname’:

%macro read_file(fname);
*** Libnames ;
* sas dataset directory;
libname sasdata "&path.\&adhocref.\sasdata" ;

* Open file and create libref;
libname returnWB EXCEL "&inputdir.\&fname" USEDATE=NO;

* strip off the file extension;
%let namelen=%eval(%length(&fname)-4);
%let fshortname=%substr(&fname, 1, &namelen);

data sasdata.&fshortname.;
  set returnWB.rngData;
run;

* Close file and clear libname;
libname returnWB clear;

%mend read_file;

In this application, I then use the SAS functions dopen, dnum, dread and dclose to iterate over files in the source directory and create a bunch of SAS datasets from their specified ranges:

%macro get_all_files;

%let fref=flist;

%let dname=%sysfunc(filename(fref, &inputdir));

%let did=%sysfunc(dopen(&fref));

%let numfiles=%sysfunc(dnum(&did));

%do i=1 %to &numfiles;

  %read_file(%sysfunc(dread(&did, &i)));

%end;

%let dname=%sysfunc(dclose(&did));

%mend get_all_files;

The advantage of having a libname created is then that the data can be viewed and manipulated in much the same way as any other SAS dataset. The conversation can of course go the other way, simply by changing the order of the dataset references in the data step:

%macro writeto_file(fname);
*** Libnames ;
* sas dataset directory;
libname sasdata "&path.\&adhocref.\sasdata" ;

* Open file and create libref;
libname returnWB EXCEL "&inputdir.\&fname" USEDATE=NO;

* strip off the file extension;
%let namelen=%eval(%length(&fname)-4);
%let fshortname=%substr(&fname, 1, &namelen);

data returnWB.rngData;
  set sasdata.&fshortname.;
run;

* Close file and clear libname;
libname returnWB clear;

%mend writeto_file;

The macro now creates the range rngData in the workbook (assuming it doesn’t already exist – if it does this method will not work). The article De-Mystifying the SAS Libname Engine in Microsoft Excel contains some great examples of how to access, replace, create and delete ranges in Excel workbooks.

Next time I’ll look at an alternative: SQL pass-through (also through SAS/ACCESS), and the reason this may be preferable.