Building an Excel currency converter, part 2

To recap where we left off in the first part, we’d managed to import xml data from the rss feed for NZD rates at themoneyconverter.com into a table (which I’ve named Rates_Table). This gave us a list of rates which we can use – however the format is not quite right. In the “description” member of “item”, the rate is expressed in a structured text format, like so: “1 New Zealand Dollar = 2.96886 Argentine Peso”. Now of course that would be useful as it is, if I were prepared to manually type out the 2.96886 part myself to convert from NZ dollars to pesos. Which of course I’m not, lazy data analyst that I am.

What I’m going to do instead is to extract the two relevant parts of that text, the two bits that I need to know: the rate (the numeric bit, in this case 2.96886) and the foreign currency description (the character string that comes after the numeric bit, in this case “Argentine Peso”). This is something that can be done with text functions like MID and FIND in Excel, of course, and this is a perfectly legitimate way to go about it. However, unless you can find a really clever way to do it, you’ll end up with a really long, convoluted and awkward formula. Which is a great reason to use regular expression functions instead. RXGET for instance, is designed to do exactly this sort of extraction. Here’s how I went about it:

First, we need to determine the regular expression pattern that fits the text. The pattern I chose is
=\s(\d*\.\d+)\s(.+)
or to break it down a little,

  1. an equals sign followed by a space: =\s;
  2. followed by a group, which in turn contains zero or more digits, followed by a decimal point, followed by one or more digits: (\d*\.\d+) This will be used to capture the rate part of the expression. Note that the decimal point needs to be escaped here, as it signifies an arbitrary character in a regular expression;
  3. followed by another space: \s;
  4. followed by a second group, which contains one or more arbitrary characters: (.+) This will capture everything after the space, which in this case will be the currency description. Here the decimal point should not be escaped as we need it to match any character.

Once I know that this pattern works, I add a new sheet titled ‘Reference’ and enter the pattern into a cell that I name Rate_Pattern. Now I can use it to extract the rate and the currency using RXGET. I add a column with the header rate at the right-hand end of the Rates_Table and enter in it the RXGET formula, with the following arguments:

This corresponds to the formula:
=RXGET(Rate_Pattern,Rates_Table[[#This Row],[description]],1).
Note that the Submatch argument has the value 1, as we’re extracting the 1st matched group from the expression. To get the currency description, in another new column with the header currency I enter the same formula with the Submatch argument changed to 2 to extract the second group. So here’s how the table looks right now (click on the image to take a closer look):

In the final part of this series I’ll complete the converter by adding in a selector to pick out a rate from the table by specifying the currency.

Advertisements

Regex Functions updated

Taking a brief pause on the way to completing the currency converter, to revise the functions in the Excel regular expressions addin, Regex.xla. I’m really rather slack in leaving it this long – it was pointed out to me several months ago that the way I had passed error values back to the calling cell did not work as intended if the return value was not a Variant. For instance in RXGET the intent of

RXGET = CVErr(xlErrNA)

is to have Excel display #N/A in the cell. As it’s returning a String rather than a Variant however, it displays #VALUE! instead. So I’ve altered the return types to Variant in all the functions, to allow for more meaningful error values to be returned.

It was also pointed out more recently that using IsMissing to test for the presence of an optional parameter and provide a default, like so:

If IsMissing(start_num) Then start_num = 0

will not work, as IsMissing works on Variants only, rather than primitive data types. Actually it’s a better idea to specify default parameters anyway, then there’s no need for this logic. So that’s what I’ve done.

Thanks to Colin and mifrey for pointing out the above issues. I’ve replaced the add-in on fileden, and if you like the code changes can be inspected at the Regex functions page.