Names in Excel

First off, I should apologise for the complete lack of posts over the last few weeks. It’s been a rather busy time for a month or so now, and while there hasn’t been a lack of things to write about, there has certainly been a lack of energy to actually write about them.

Anyway, I posted a few weeks back about the modelling life cycle, where I talked about the desired outcomes of implementing a standard ‘best practice’ modelling process. One of these was effective documentation of the model – effective in that everyone who needs to build, use, test and maintain a model has access to the information they need to understand its intent. The documentation should primarily be external to the model itself, but in my opinion, a best practice model design should also apply an approach adhering to the principles of self-documenting code. That is, the model should be built in a way that is easy to understand, use and extend without the need for extensive reference to external documentation.

Names in Excel are a key part of that approach. While it is entirely possible to build a model which lives up to its specification without using names at all, I can almost guarantee that appropriate and judicious use of names would shorten development time, produce formulas that are easier for the tester and/or user to understand, and produce something that would be easier to extend or modify. There are many references out there to how names can be used in Excel, so without trying to repeat or summarise all of those, I’ll just highlight what I see as the main benefits of each technique in this context.

The first and most obvious way to use names is in defining named ranges. A cell named inpDiscountRate is meaningful to both a developer and a user – The prefix inp signals that the range is an input or assumption, and DiscountRate tells them what it denotes in the context of the model. This means that a formula referencing the discount rate in the model is far more comprehensible at first glance than something which has a reference to, say, $B$6. It’s also helpful to be able to denote lookup tables and arrays used in the model. for instance in a table where an ID field is located in column A, there are a couple of ways to retrieve the ID value from elsewhere in the table. One is to name the whole column (excluding the header), maybe using a dynamic range name, defining ptrID as

=OFFSET($A$1,1,0,COUNTA($A:$A)-1,0)

Referring to ptrID (The ptr prefix denotes a pointer) in a formula in another column will then pick up the value in the ID column by implicit intersection. Another way to accomplish the same feat is to define a relative range name, again ptrID, as $A6, where the active cell is, say, B6. The reference is row-relative, so that a reference to ptrID picks up the cell in the same row as that the formula is entered in, but always looks in column A. I prefer using relative references as they’re less work to code, but either way we get the same benefit, of a formula which has more meaning than one which relies on cell addresses.

Names can also contain constants, thereby eliminating the need for using ‘magic numbers’ in formulae. Using

=ptrCellToLeft*inpDiscountRate^(1/cNumPeriods)

where cNumPeriods is defined to have the value 4, means that not only is the logic in the formula more transparent than it would be otherwise, but it is also more flexible. A change in granularity from quarterly to monthly requires no more than changing the value of cNumPeriods to 12.

Finally, a name can also encapsulate a formula. The benefit I see in a named formula is principally that it can serve to simplify otherwise unwieldy calculations. However, I prefer not to make too much use of them in this way, as in my opinion they can also serve to obscure the logic involved. In most cases, if a formula is too long to understand easily at first reading, a better approach to simplifying it is to split it into two or more cells. Assuming those cells are themselves assigned names, the end result will look pretty similar, but it is easier for the reader to access the logic. Also, this approach could conceivably improve performance – a really long formula can have a lot of dependencies, so by splitting it up we can reduce the amount of recalculation required when any one of those changes.

I also should make mention of the excellent Name Manager add-in, available from JKP Application Development Services. Working with the Insert Names dialog can be very time-consuming, but this add-in greatly simplifies the creation, modifying and documenting of names. The addition of the built-in Name Manager in Excel 2007 is a great improvement, but as was pointed out in a comment on an earlier post, it still doesn’t equal the functionality of this very handy add-in.

The modelling life cycle

Recently I decided to propose the use of a framework in the spreadsheet modelling process in my business unit. The idea for the framework is based on the slightly dated but still excellent document Spreadsheet Modelling Best Practice, available for download through the European Spreadsheet Risk Interest Group website. Basically, my motivation was provided by one particular modelling process I saw in action recently, where someone with no real skills in spreadsheet design (let’s call them Noddy) was tasked with building a model. This model is going to inform several key internal and external stakeholders in our business about downstream effects of a significant change to a business process. High visibility, high risk. The thing is, I know I’m going to be called upon at some stage to help out with this, and I also know it’s going to be a hard ask to get details from Noddy – I’ve seen some of what he calls documentation, and it’s nowhere near as specific as it needs to be.

With all this in mind, the question for me is: how can we avoid this situation in the future? The Best Practice paper seemed to offer a nice framework to ensure that two things are guaranteed from any modelling process:

  • Constructive engagement from everyone that needs to be engaged; and
  • Documentation which allows for ease of testing, maintenance and re-use.

The core idea is simple – any model built according to best practice goes through a six-stage life cycle, as follows:

  1. Scope
  2. Specify
  3. Design
  4. Build
  5. Test
  6. Use

This is iterated as required. The outcomes from each stage are concrete, and serve as inputs into later stages – for instance, the outcome of the specification stage is an unambiguous statement of how the model will calculate the results required – this guides how the model is designed and then built, and serves as a framework to test against.

One of the key weaknesses with our current setup is that people like Noddy often charge into the build stage without any regard for the first three stages, which in my opinion are the most important to getting the model right. Building a model in Excel, once you know the questions it has to answer, have the data you need, know exactly how you’re going to answer those questions, and what it’s going to look like, is a trivial process (given, of course, a fair knowledge of worksheet functions and the Excel UI).

I, too, am in spreadsheet hell

Ok, let me start by saying that the use of the term “hell” is probably a little too strong for the situation I’m about to describe. But I just stumbled across an old post in Smurf on Spreadsheets linking to a paper written by the author, which struck an immediate chord with me. Without reviewing the contents of the paper, I’ll just mention a couple of the points which stood out:

  • The “key man” dependency problem – critical spreadsheets and systems of spreadsheets in many corporate environments have been built, or at least designed, principally by one person. This one person hands the system over to the prospective business owner or end user with either little or no documentation, technical or otherwise. No-one else has the same perspective on the overall structure of the system, or sufficient knowledge of its inner workings to be able to support it. So when that person leaves the building, where does that leave the user when something goes wrong? Not in a good place…
  •  

  • External linking – It’s an extremely common (and risky) practice to link related spreadsheets within a system or application in a rigid, predetermined fashion in order to collate, summarise or manipulate data, slicing and dicing on the way to producing a presentation layer (perhaps a management dashboard or other regular report) at the end. This often means that the accuracy of the results and/or the completion of the process is extremely sensitive to both the structure of the data and to the containing file structure. As a result, an error in the process often means a protracted, tortuous investigation to trace the point at which it originated.
  •  

So my situation relates to both of these points. When I started in my current job, about a year ago, I was designated as the support person for a monthly business reporting process in another team. My predecessor had been the person who “designed” this process, which involves a total of about 60-70 spreadsheets in total, 30-odd of which are raw data entered into templates, and the rest summarising this data in various ways, resulting in 3 or 4 key reports for various consumers, mostly upper management.

The issue I have is that the process is, overall, a complete mess. The importing of data and regeneration of consolidated new reports is handled by a bunch of VBA routines, which is why I was nominated as the support person (Geoff knows VB, he can do that). Imagine my horror to discover that the ongoing maintenance of the code required, each and every month, copying and pasting every single routine used the previous month and then replacing within it every single reference to that month – month names, dates, file paths, everything. There were no variables used in the code, anywhere – it had basically all been recorded, then updated, every month for the last three years. Anecdotal evidence suggested the bloke who was doing this spent around 3-4 days every month on this and a couple of other similar processes.

Well, I was having none of that, so I spent around 3 weeks painstakingly untangling the intent of the code, and then refactoring, until finally I was left with something that requires no regular maintenance. This now leaves the real problem to deal with, which does require regular attention.

Most of the cells in the monthly summaries represent the last link in a chain of dependency stretching back over several workbooks. It works, mostly – but where it doesn’t, leaving incorrect or #REF values, it is a real pain in the arse finding where it’s gone wrong. The person who runs the reports calls me up most months to report just that happening, and investigation often reveals nothing more than links not being updated. Or occasionally I’ll discover a link pointing to the wrong place, or a template incorrectly filled out. What the process needs is essentially complete redesign, but given the current constraints on budget in our business, it’s not going to happen anytime soon. In fact, I can see this stretching out over the next 5-10 years, and when I’ve gone, the next person to support it will be in much the same situation I was. I’ve made some attempts at documentation, but I really don’t have the time now for anything else.