The perils of large spreadsheets


Ok, we’ve all been there. What started out as a relatively compact workbook has blossomed into a gargantuan mess that requires calculation set to Manual to change anything, and takes a full minute just to open. It’s not nice, particularly when the workbook in question is one that is relied upon heavily. I “owned”, or should I say “inherited”, one of these a couple of years back, which was used by over a hundred people in a call centre – and loathed by all of them. I didn’t like it much either, as I had to field all the calls and emails asking for it to be “fixed”.

So what to do? In a sense, of course, there’s nothing to fix, as long as everything does recalculate as it should. If not, or worse, if Excel crashes when it’s being used, then there is a problem, which usually warrants a complete redesign. Unfortunately that happens more often than the average user realises. While Excel’s memory management is continually improving with new releases (and the average memory available to it is increasing), you can only push it so far before it gives up the ghost. However, let’s assume for the sake of argument you’re not in that awful place right now. So while there’s nothing broken as such, the performance you’re getting out of it is not what it should be. What can you do to improve it? Here’s a couple of things to try.

  • Cull unnecessary data. There are cases where workbooks (particularly some financial models) do require several sheets with thousands of rows. But it’s worth questioning whether it’s all absolutely necessary. And does everything need to be in the same workbook? It may be that you can get significant gains by splitting the workbook into two or more pieces, which can then be hyperlinked – and by grouping related data this can make for a more user-friendly tool.
  • Re-use the same cache for your pivot tables if you can. Excel will very helpfully prompt you to do just that when it realises you’re about to create a pivot table using the same range or database that forms the cache for another pivot table. It also explains why – you will save memory and your workbook will be smaller. This is almost certainly a good thing. And don’t make the mistake of creating a separate cache for a subset of a previously used range (another common error) – you can always use the whole thing and exclude what you don’t want from within the pivot table after it’s been created.
  • Avoid formula repetition, wherever possible. Too many people take the “Jurassic Park” approach to building formulas, and assume that just because it is possible to do something, they should do it. Not true – here’s an example: =IF(VLOOKUP($M6,$B$2:$H$100,3,FALSE)>AVERAGE($B$2:$B6),
    VLOOKUP($M6,$B$2:$H$100,3,FALSE)-AVERAGE($B$2:$B6),
    VLOOKUP($M6,$B$2:$H$100,3,FALSE)+AVERAGE($B$2:$B6))

    Now you may ask, what’s wrong with that? Well, look at it from Excel’s point of view:
    Excel: So what do you want me to do?
    You: Well, first look up this thing in the first column of that table, and go along 3 columns from it and get the value there.
    Excel: Got it – easy!
    You: Good. Now take the mean of these numbers here and compare it with the number you just got. If it’s smaller, then look up this thing in the first column of that table, go along 3 columns and get the value there.
    Excel: Wait – you asked me to do that already.
    You: Well now I want you to do it again.
    Excel: OK then – it’s your party.
    You: Right. Now take the mean of these numbers here,…
    Excel: Seriously? I just did that before.
    You: Do it again, and subtract it from the number you just got.
    Excel: Alright – done. What if the number wasn’t smaller?
    You: Well, in that case, look up this thing in this table here…
    Excel: I can feel a migraine coming on…

    You get the picture. Of course, the problem is that Excel won’t complain when you ask it to do something silly like this – but any reasonable person would suggest that you calculate the results of the VLOOKUP and AVERAGE functions once, in separate cells, and then use the values however you like, as many times as you like, without creating a job that takes twice as much recalculation as would otherwise be necessary.
  • A related issue is the (over)use of VLOOKUP. Where you need to look up the same value more than once, even in separate cells, returning data from different columns, Excel is still having to repeat the same task unnecessarily. MATCH is your friend here, in combination with either OFFSET or INDEX. For example:
    A1=(VLOOKUP(M6,$B$2:$H$100,3,FALSE)
    A2=(VLOOKUP(M6,$B$2:$H$100,4,FALSE)
    A3=(VLOOKUP(M6,$B$2:$H$100,7,FALSE)

    could be replaced with:
    A1=MATCH(M6,$B$2:$H$100,0)
    A2=INDEX($D$2:$D$100,A1)
    A3=INDEX($E$2:$E$100,A1)
    A4=INDEX($H$2:$H$100,A1)

    Over big tables this can represent a huge saving in calculation time.

And that’s all I’ve got time for right now. Good luck!

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s