Regular Expressions – new version

Just a quick post tonight to let you all know, I’ve added a new page for downloads, which contains a new version of the regular expressions add-in, compatible with Excel 2007 and later. I’ve added in a couple of utility functions for convenience (mine more than yours but you might find them useful), and a form to evaluate regular expressions against test strings. And there’s even documentation!

The documentation gives a (very) brief intro to the uses and abuses of regular expressions, a run-down of the worksheet functions in the add-in and some examples of their use. Here are a couple of those, I hope you find them useful.

Matching cells which contain variations on a word

There are some words in the English language which Americans, god bless them, spell in their own special way. However, given input on the spreadsheet from users who spell both ways (correctly and incorrectly), you may wish to match both variations of words like ‘realise’ (‘realize’) and ‘colour’ (‘color’).
The pattern to match realise/realize is simple: \breali(s|z)e\b
The word boundary markers ensure we are looking at a complete word, and the alternation of (s|z) means that we match both versions.
Applying the ISRXMATCH formula demonstrates this is successful:

Validating Email Addresses

Given a list of email addresses in a column on a spreadsheet, we wish to ensure that these stick to a form which at least obeys some of the rules governing the format of email addresses. As these are going to be used by a script to send emails, we wish to minimise the number of undeliverable responses due to invalid addresses. The basic rules we specify for these addresses are as follows:
The username part of the address contains one or more alphanumeric characters, and possibly some additional special characters. This is followed by a single @ sign, followed by the domain name, which consists of one or more alphanumeric and special characters, ending with a dot followed by the top-level domain. This must contain only alphanumeric characters, and there must be between 2 and 6 of these. The address should be the entire content of the cell, so the beginning and ending anchors are used at the start and end of the pattern. Case is unimportant, so the case_sensitive flag is set to false.
The pattern is as follows: ^[a-z0-9_%.+-]+@[a-z0-9-.]+\.[a-z]{2,6}$
This is then used in ISRXMATCH – a valid email address according to our rules above will return true:

The second address in the list fails due to the whitespace in the username, whereas the fourth fails because the domain name does not include a top-level domain part of a dot followed by 2-6 letters.
I borrowed this regex from As well as a couple of alternative regexes to cover some edge cases which the above doesn’t catch, this page also discusses why email addresses can be tricky, and why you shouldn’t go overboard trying to cover every exception.


The Five Reasons Google Docs Is Not Ready To Replace Excel

Today’s post is from guest contributor Olivia Lennox.

Not too long after the initial excitement of having an application that looks exactly as a spreadsheet is meant to open up in seconds within your browser, Google Docs starts to let you down. Everything from performance on large sheets to the presentation of data is substandard when compared to even the Excel that ships with the 2003 version of Microsoft Office.

Here are the five main areas Google need to tackle before power spreadsheet users can even think about moving from Excel.

1. Graphical Presentation Of Data

One of the first tasks we attempted to replicate was to store and present some stock market performance data from an online share dealing account. To be fair to Google Docs Spreadsheet getting used to entering data was simple and the chart was quickly on it’s own sheet after clicking the add chart button and choosing a few options.

Here was where the vast power of Excel that we take for granted was obviously lacking. How about adding trend lines to scatter charts? This seems currently to be an either/or only feature in Google Docs which is disappointing. In fact only a limited amount of customization is possible at all in the Graphing function. The end result of this test was a graph with the correct headings, colours and lines on Excel and a poor compromise of one line, and unwanted colour and style in Google Docs.

With Excel hardly at the zenith of the data presentation world (some apps like Jaspersoft are near this level) it’s not really possible to consider Google Docs a suitable contender while it rests a long, long way behind Excel.

2. Load And Usage Performance For Large Sheets

Many organisations use Excel to manage very large amounts of data and do modelling on that data using some pretty complicated formulas. I’d say being able to handle these sheets with ease on a mid-range PC typical of many office settings should be a very high priority.

Unfortunately Google Docs significantly disappoints. In some simple tests with opening one large sheet, amending some fields and waiting for the recalculation Excel actually finished before we were able to enter the field on Google. This was repeated on five separate mid range machines with above average Internet connections and two gigabytes of RAM.

3. The Range Of Data Analysis And Presentation Tools In Excel Blows Google Away

Google does have the basics right here. Sorting data and pivot tables worked well and relatively quickly (although on the machines we tested Excel was always faster as noted above) and was easy to figure out for an experienced Spreadsheet user.

The Data Analysis TookPak in Excel however is just so powerful it’ll be a long time before any Spreadsheet can match it – let alone an online-only solution. Many challenging University level and beyond projects can be completed using these tools as most statistical functions are supported. A good guide can be found here which also uncovers some of the pitfalls of using a spreadsheet for some of these tasks.

4. Change In Scripting Language

Whilst this is not necessarily a fair criticism of Google Docs they have chosen to use Javascript as their scripting language whereas Microsoft uses VBA (Visual Basic for Applications). Not only are more casual users familiar with BASIC from their education – it takes school children who’ve learned BASIC just a few hours to get scripting in Excel at a simple level but existing spreadsheets are now significantly more difficult to port to Google Docs. Any corporation with a range of large sheets containing scripts needs to think very carefully about a move.

5. Security Concerns

“Security consultant Ade Barkah checked in with us to alert us to a couple of serious security issues associated to Google Docs, the web-based office software from the world’s most famous search engine company, giving a whole new meaning to its mission to make the world’s information universally accessible.”
TechCrunch 2009

Throughout the years since it’s launched there have been a number of so far minor security issues with Google Docs. However minor these have been and however quickly Google has fixed them corporations used to controlling their own security will be concerned with these issues. Microsoft hasn’t always had a perfect reputation for security but to be fair to them spreadsheets on your own private server haven’t been busy sharing a few graphs of private information with the World Wide Web.

2011 in review

The stats helper monkeys prepared a 2011 annual report for this blog.

Here’s an excerpt:

The concert hall at the Syndey Opera House holds 2,700 people. This blog was viewed about 21,000 times in 2011. If it were a concert at Sydney Opera House, it would take about 8 sold-out performances for that many people to see it.

Click here to see the complete report.

Combining chart types

Wow, it’s been a long time. This is a post I had planned for some time ago but didn’t get around to until now – as many All Black supporters might say, about bloody time

In a made-up scenario I have two rates to report on, Metric 1 and Metric 2. An indicator of good performance is that Metric 2 sits 5-10% above Metric 1. So when I plot the monthly data, as well as the two Metrics time series I’ll add in another series called Difference, that will be calculated as Metric 2 – Metric 1. I’ll also add in the constant series Target 1 (5%) and Target 2 (10%), and what I want to see is the Difference series sitting between these two targets. Here’s what that looks like:

Now, I don’t know what you think of that, but to me it’s confusing. The difference between Metric 1 and 2 and where it sits in relation to the 5% and 10% marks is the key indicator I need to follow, but both Metric 1 and 2 are also important to provide context. Putting them all together on the same line chart makes it harder than it needs to be to see what the indicator is doing.

Thankfully, there is a much simpler way to present this, which avoids the need for the eye to track all five time series together on the chart, and also manages to signal much more clearly where the key indicator is in relation to its target. To do this I can use the relatively simple expedient of combining different chart types.

First, I right-click the Metric 1 time series, select ‘Change Series Chart Type…’ and pick out the ‘Stacked Area’ chart type:

I also format the fill of the area series to a red colour:

Next, I make the same change of chart type to the Target 1 time series. I change the colour to orange (or amber if you prefer), and the name of the series to ‘0-5% above’:

I remove both the Difference and the Target 2 time series from the chart, and add the Target 1 series in again, by copying the data range from the worksheet, selecting the chart and pasting. Once that’s done, I again change the chart type of the new series to stacked area, set the fill colour to green and rename it to ‘5-10% above’. I also change the line colour of the Metric 2 series to a dark blue to provide a little more contrast against the area charts:

And that’s done. We now don’t need to show the ‘Difference’ series at all, as the magnitude of the difference between Metric 1 and 2 is clear without it. I also find the chart a lot simpler and more visually appealing. I should note in passing that I very rarely use the stacked area chart type, but it makes sense to me in this context.

Combining chart types like this is a simple and effective but under-used method – I guess that’s one side effect of the chart wizard, people assume that once the chart type is selected, that’s it. I first came across the idea at Jon Peltier’s site (this page has a lot of cool examples), it’s been a handy tool ever since.

An Undo Stack in VBA

I’m working on an Excel add-in which will allow a user to perform operations on cells. I’d also like to allow the user to ‘undo’ these operations. Excel itself, of course, does have an Undo button (shortcut Ctrl + Z), but this works only to undo changes made in the user interface, and is not going to help for changes made by VBA. This means the add-in will need to remember each operation performed and have them ready to supply for un-doing should the user want to do so.

Many people have created their own general solutions for undoing changes made in VBA – for instance, here’s one from Jan Karel Pieterse (who also created the hugely useful Name Manager add-in): Undo With Excel VBA. However I decided not to use a solution like this – it’s quite a lot of code to add to a project, and it is very general.

In this case I’m happy to write a more specific solution, which copes with undoing a specific action, by simply delegating the work of undoing the action to the object which did the action in the first place. Here’s an example:

' CellTest Class - allows calling code to change the value of a
' cell and then change it back

' Private Variables
Private mCell As Range
Private vOrigFormula As Variant

' Public Properties
Public Property Get Address() As String
  Address = mCell.Address
End Property

Public Property Set Cell(ByRef rCell As Range)
  Set mCell = rCell
  vOrigFormula = rCell.Formula
End Property

' Public Methods
Public Sub Change(ByVal sText As String)
  mCell.Value = sText
End Sub

Public Sub ChangeBack()
  mCell.Formula = vOrigFormula
End Sub

So what about storing up a list of operations for undoing? This is ideally suited to a stack. Stacks are neat data structures – they don’t do much, but they do it well and they’re really easy to code. Here’s one I picked up from the VBA Developer’s Handbook and tweaked a little to suit. There are two classes, StackItem and Stack. StackItem simply stores a value and a reference to another StackItem object – the next item on the stack.

' StackItem class holds a reference to the object it refers to and
' a reference to the next item in the stack

Public Value As Variant
Public NextItem As StackItem

Stack stores a reference to the item on the top of the stack, and exposes the functions Push (to push a new item on to the top) and Pop (to remove the top item and return a reference to it). Properties IsEmpty and Peek tell you if there are any items in the stack, and what the value of the top item is (without removing it), respectively. There’s a very nice visual explanation of how this looks here.

' Stack holds a reference to the top item in the stack data structure
' and manages the adding and removing of stack items

Private siTop As StackItem

' Public Properties
Public Property Get IsEmpty() As Boolean
    IsEmpty = siTop Is Nothing
End Property

Public Property Get Peek() As Variant
    If IsObject(siTop.Value) Then
        Set Peek = siTop.Value
        Peek = siTop.Value
    End If
End Property

' Public Methods
Public Function Push(ByRef varIn As Variant) As Boolean

    Dim siNew As StackItem

On Error GoTo PushError

    Set siNew = New StackItem
    If IsObject(varIn) Then
        Set siNew.Value = varIn
        siNew.Value = varIn
    End If
    Set siNew.NextItem = siTop
    Set siTop = siNew
    Push = True
Exit Function

    Push = False
End Function

Public Function Pop() As Variant

    If IsObject(siTop.Value) Then
        Set Pop = siTop.Value
        Pop = siTop.Value
    End If
    Set siTop = siTop.NextItem

End Function

So now to store an actions for undoing, I just need to push the object which performed it onto the undo stack. Every time I wish to actually undo an action, I just pop the object reference off, and instruct it to undo.

Next time, I’ll demonstrate how this looks in practice.

Maintaining SAS formats in Excel

For one of the subject areas I deal with day to day, categorical data values are often stored in the database in narrow character fields, so we need to apply SAS formats to get readable values out in reports. To maintain these formats I store them in an Excel workbook, looking roughly like this:

This is a relatively painless way to manage keeping SAS formats up to date, as well as making them easily accessible to people who need to inspect them. Each format is stored in a named range, with the name of the range matching the name of the format. The column headers are chosen to agree with the SAS Format procedure requirement for an input control data set. The above range, for instance, would be named ‘excel_user’, and would be used to input the required data for a character format (Type=’C’) also named ‘excel_user’ (Fmtname).

So how do I get them into SAS from Excel? Once again, the SAS/ACCESS Excel libname statement is helpful here. First, we assign a libref ‘wb’ to the workbook holding the format ranges, and another libref ‘sasprogs’ which will be used to store the format catalog:

libname wb Excel "&path.\formats.xls" ver=2002 ;
libname sasprogs "&path" ;

Next we need to get a list of all the ranges that we’re interested in here. In this workbook the only named ranges are the ones containing the formats. Any other names that the libname engine detects will be the names of worksheets, which will contain a ‘$’ character. So, we simply exclude these from the query against the dictionary.tables table:

proc sql;
  create table tmp as
  select memname
  from dictionary.tables
  where libname='WB' 
    and index(memname, '$') = 0 ;

So now the table ‘tmp’ holds a list of all the format range names in the workbook. Now we need to use this table to guide SAS in building those formats. What we need to do is to submit a proc format statement for every one of those names. Like so:

%macro load_fmt(name) ;
  proc format library=sasprogs.sasfmts cntlin=wb.&name ;
%mend load_fmt;

Now we simply need to loop through the set of names to load each of the formats into the sasprogs.sasfmts catalog:

data _null_;
  set tmp;
  call execute('%load_fmt('||memname||')') ;

All that’s left to do now is a little tidying up:

/*Close the file*/
libname wb clear ;
/*Inspect format contents*/
proc format library=sasprogs.sasfmts fmtlib cntlout=sasprogs.sasfmt_data; run;
/*Add them into the search path*/
options fmtsearch=(library sasfmts);

And we’re done – all the formats are now loaded and available in the SAS session. Now if there are any changes, or a new format needs to be created, just add it into the file, save it and re-run this script. Sorted!

Why Corporate Excel Training Sucks For Everyone

Today’s post comes from Sean Duffy, CEO at Excel Everest. | Corporations need to rethink the way they train their employees in Microsoft Excel.

Corporate Excel trainings, if they are offered at all, are usually three hour classes in which 20 people gather around a projector and more or less watch an instructor go through various features of Excel. Occasionally, students are asked to join along and type in a formula or two, but on the whole, it’s usually a teaching exercise, not a learning exercise.

At Excel Everest, we’ve heard this numerous times from numerous people, and we built our whole business around the premise that firstly, those who have a strong grounding of Excel inside organizations tend to be more successful, and secondly, that the current method of training employees is antique.

After having gone through a number of Excel trainings ourselves, we don’t think people are *actually* learning from the corporate Excel classes. Along with countless users we’ve talked to, a recent bit of feedback from an Excel Everest customer lays this out pretty clearly:

I have attended numerous one and two day programs “teaching” excel. I found that once the program is over, I am no better than I was before. For me most programs show us all the things Excel can do but they don’t really teach you Excel. Your program is the first that I feel that I am learning it such that I can use it. I also like that I can repeat and review any particular lesson any time. If you don’t do/use it you lose it. – Kimber

Kimber, like many others, has been there… she’s attended classes that have no relevance, no real world scenarios, and no accountability. Plus, our guess is that it’d be difficult to describe any of them as “fun.”

No only is this a losing situation for people like Kimber, but from the perspective of the company who’s paying for trainings, it’s also a risky proposition. There’s no way to know that employees are actually learning from the trainings, given that there’s no accountability. Return on investment simply cannot be shown.

We’ve thought deeply how to fix this problem of ROI as well as the problem of employees, well, not learning Excel, and we built Excel Everest, a complete training course in Excel that’s built entirely into an Excel file. Imagine a huge workbook that teaches employees Excel, challenges them with exercises and grades you automatically in a comprehensive scoreboard. We’ve built Excel Everest to be a self-service and fun learning tool. An employee can complete the tutorial on her own time.

From the employers standpoint, this makes perfect sense as well. If an employee completes Excel Everest, she can simply send the completed document over to her manager, and her manager can quickly see that she has learned the material. It’s as easy as that. The employer knows that the money they spent on Excel Everest is paying off.

If you’re interested in using Excel Everest inside your organization, say hello at

// Excel Everest ( is a complete Excel training course built inside an Excel file. The goal of Excel Everest is to provide an immersive, interactive, learning Experience for employees, all while demonstrating a return on investment to employers. Excel Everest is currently being used in organizations such as Google, Hymans Roberston, and PlayCore.