Stand Back – I know Regular Expressions


Part 1 of 4

Yes, it’s true, I do – and that post has to be one of my favourite from the xkcd strip. I’ve used them many times over the last couple of years, in projects ranging from Python scripts written on weekends to critical SAS scripts I’ve written for work. So what are they?

Well, I’m not going to get into all of the gory details here, I’ll leave that to Wikipedia. But for the purpose of a quick and simple explanation, regular expressions are a very powerful tool used to match text. You know how, when you’re using the autofilter in Excel, you can match an arbitrary string of characters by using the wildcard *, or a single arbitrary character using ? So for instance, you could match the words “retail” and “detail” with the expressions ?etail or *ail.

Well, regular expressions offer the ability to do that sort of matching – but they offer far more besides. The core idea is that a regular expression pattern offers a compact and unambiguous way to denote a set of character strings. For example, the pattern \b\w+(e|d)\b denotes the set of all words which end with either e or d. And the pattern \d{3,5}\s+(Jim|Bob) matches all strings of digits with length at least 3 and at most 5, followed by whitespace, followed by either “Jim” or “Bob”.

As I said, I’m not going to get into all the details right here, so I’ll stop short of a proper exploration of regular expression syntax, which would be available as part of most introductory tutorials for Perl, Ruby or Python, to name but a few. However, I do want to outline how all this relates to my current project. Actually, in a sense, it is my current project.

You see, as soon as I discovered regular expressions, I started wondering why such a powerful tool was not implemented in as versatile an application as Excel, with all of its bells and whistles. While we do have things like the Find dialog, the MATCH() function and the Auto and Advanced filters, the ability to harness the power and simplicity of regular expression pattern-matching would really come in handy on occasion. Well, it turns out that the ability is there – it’s just slightly hidden. Microsoft included a regular expression engine in VBScript back in 1999, the latest version of which can be used in a VBA project by adding a reference to it, like so:

 

Add a reference to the Microsoft VBScript Regular Expression Library

Add a reference to the Microsoft VBScript Regular Expression Library

Now, while all the power of this engine is available to the VBA developer, my thinking is that it would be helpful to make this available where the rubber meets the road for the majority of Excel users – on the worksheet. My idea is to create a suite of worksheet functions and forms which can be used as an alternative to built-in functions, allowing users to harness the power of regular expressions. Next time, I’ll detail the functions I intend to include, with a brief summary of their implementation.

Advertisements

3 thoughts on “Stand Back – I know Regular Expressions

  1. Tim Mayes says:

    Geoff, this is a good idea and I’m looking forward to the rest of the series. I haven’t used regex much (mainly in .htaccess, but also in some PHP), but I’ll be interested to see what you come up with.

    I’m sure that I’ll discover some other uses for regex in Excel, but the main one that I can see is a boolean function that tests a string for a match against a pattern. That would be very useful.

  2. geoffness says:

    Thanks Tim – I agree, that’s precisely the sort of thing I’ll be putting in there. Basically the intention is to emulate functions like MATCH() and FIND() with regex functionality, and also to include some functions using the Match and Submatch collections to extract matching text. I must confess to this being the first general-purpose add-in I’ve tried to build, so it’d be great to have some feedback on this as I go. Cheers!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s