The perils of large spreadsheets

Ok, we’ve all been there. What started out as a relatively compact workbook has blossomed into a gargantuan mess that requires calculation set to Manual to change anything, and takes a full minute just to open. It’s not nice, particularly when the workbook in question is one that is relied upon heavily. I “owned”, or should I say “inherited”, one of these a couple of years back, which was used by over a hundred people in a call centre – and loathed by all of them. I didn’t like it much either, as I had to field all the calls and emails asking for it to be “fixed”.

So what to do? In a sense, of course, there’s nothing to fix, as long as everything does recalculate as it should. If not, or worse, if Excel crashes when it’s being used, then there is a problem, which usually warrants a complete redesign. Unfortunately that happens more often than the average user realises. While Excel’s memory management is continually improving with new releases (and the average memory available to it is increasing), you can only push it so far before it gives up the ghost. However, let’s assume for the sake of argument you’re not in that awful place right now. So while there’s nothing broken as such, the performance you’re getting out of it is not what it should be. What can you do to improve it? Here’s a couple of things to try.

  • Cull unnecessary data. There are cases where workbooks (particularly some financial models) do require several sheets with thousands of rows. But it’s worth questioning whether it’s all absolutely necessary. And does everything need to be in the same workbook? It may be that you can get significant gains by splitting the workbook into two or more pieces, which can then be hyperlinked – and by grouping related data this can make for a more user-friendly tool.
  • Re-use the same cache for your pivot tables if you can. Excel will very helpfully prompt you to do just that when it realises you’re about to create a pivot table using the same range or database that forms the cache for another pivot table. It also explains why – you will save memory and your workbook will be smaller. This is almost certainly a good thing. And don’t make the mistake of creating a separate cache for a subset of a previously used range (another common error) – you can always use the whole thing and exclude what you don’t want from within the pivot table after it’s been created.
  • Avoid formula repetition, wherever possible. Too many people take the “Jurassic Park” approach to building formulas, and assume that just because it is possible to do something, they should do it. Not true – here’s an example: =IF(VLOOKUP($M6,$B$2:$H$100,3,FALSE)>AVERAGE($B$2:$B6),

    Now you may ask, what’s wrong with that? Well, look at it from Excel’s point of view:
    Excel: So what do you want me to do?
    You: Well, first look up this thing in the first column of that table, and go along 3 columns from it and get the value there.
    Excel: Got it – easy!
    You: Good. Now take the mean of these numbers here and compare it with the number you just got. If it’s smaller, then look up this thing in the first column of that table, go along 3 columns and get the value there.
    Excel: Wait – you asked me to do that already.
    You: Well now I want you to do it again.
    Excel: OK then – it’s your party.
    You: Right. Now take the mean of these numbers here,…
    Excel: Seriously? I just did that before.
    You: Do it again, and subtract it from the number you just got.
    Excel: Alright – done. What if the number wasn’t smaller?
    You: Well, in that case, look up this thing in this table here…
    Excel: I can feel a migraine coming on…

    You get the picture. Of course, the problem is that Excel won’t complain when you ask it to do something silly like this – but any reasonable person would suggest that you calculate the results of the VLOOKUP and AVERAGE functions once, in separate cells, and then use the values however you like, as many times as you like, without creating a job that takes twice as much recalculation as would otherwise be necessary.
  • A related issue is the (over)use of VLOOKUP. Where you need to look up the same value more than once, even in separate cells, returning data from different columns, Excel is still having to repeat the same task unnecessarily. MATCH is your friend here, in combination with either OFFSET or INDEX. For example:

    could be replaced with:

    Over big tables this can represent a huge saving in calculation time.

And that’s all I’ve got time for right now. Good luck!


Stand Back – I know Regular Expressions

Part 1 of 4

Yes, it’s true, I do – and that post has to be one of my favourite from the xkcd strip. I’ve used them many times over the last couple of years, in projects ranging from Python scripts written on weekends to critical SAS scripts I’ve written for work. So what are they?

Well, I’m not going to get into all of the gory details here, I’ll leave that to Wikipedia. But for the purpose of a quick and simple explanation, regular expressions are a very powerful tool used to match text. You know how, when you’re using the autofilter in Excel, you can match an arbitrary string of characters by using the wildcard *, or a single arbitrary character using ? So for instance, you could match the words “retail” and “detail” with the expressions ?etail or *ail.

Well, regular expressions offer the ability to do that sort of matching – but they offer far more besides. The core idea is that a regular expression pattern offers a compact and unambiguous way to denote a set of character strings. For example, the pattern \b\w+(e|d)\b denotes the set of all words which end with either e or d. And the pattern \d{3,5}\s+(Jim|Bob) matches all strings of digits with length at least 3 and at most 5, followed by whitespace, followed by either “Jim” or “Bob”.

As I said, I’m not going to get into all the details right here, so I’ll stop short of a proper exploration of regular expression syntax, which would be available as part of most introductory tutorials for Perl, Ruby or Python, to name but a few. However, I do want to outline how all this relates to my current project. Actually, in a sense, it is my current project.

You see, as soon as I discovered regular expressions, I started wondering why such a powerful tool was not implemented in as versatile an application as Excel, with all of its bells and whistles. While we do have things like the Find dialog, the MATCH() function and the Auto and Advanced filters, the ability to harness the power and simplicity of regular expression pattern-matching would really come in handy on occasion. Well, it turns out that the ability is there – it’s just slightly hidden. Microsoft included a regular expression engine in VBScript back in 1999, the latest version of which can be used in a VBA project by adding a reference to it, like so:


Add a reference to the Microsoft VBScript Regular Expression Library

Add a reference to the Microsoft VBScript Regular Expression Library

Now, while all the power of this engine is available to the VBA developer, my thinking is that it would be helpful to make this available where the rubber meets the road for the majority of Excel users – on the worksheet. My idea is to create a suite of worksheet functions and forms which can be used as an alternative to built-in functions, allowing users to harness the power of regular expressions. Next time, I’ll detail the functions I intend to include, with a brief summary of their implementation.

The beginning…

Welcome everyone. I’ve been fooling around with some thoughts on Excel recently, and I would like to share them on here, with you. I’ll have a little more content coming out over the next few weeks as I detail work on my most recent project – it’s ongoing and I expect a few twists and turns before I get everything just right. I’ll also share some of my thoughts on spreadsheet design, VBA and programming in general. Soon to come – a summary of the what, where, how and why? of my current obsession.