How do you SQL in Excel?

QueryCell is the Excel add-in from Oak Focus Software that brings SQL into Excel. I reviewed version 1.4 some time ago, and a few days ago lead developer Sam Howley told me about the release of version 2.0.

The new version is a complete rewrite and offers a noticeable improvement in speed, stability and responsiveness, in addition to support for 64-bit Excel. As always I’m impressed with the clean, simple look and feel of the interface, which slides out the editor when you need to use it and tucks it away when you’re done. You can manipulate data from inside the current workbook, or externally through an ODBC connection.

Sam has very kindly agreed once again to give away some free licences to Number Cruncher readers. All you have to do to get yourself one is to send me an email describing how you’ve used SQL in Excel. Could be bringing external data in through a query, or querying tables in the current workbook, could be using the Excel Data menu and MS Query to add in a connection, or scripting an ADO connection using VBA, or using QueryCell. Surprise me! Send me a description of what the information need was and how you went about resolving it.

Email me at the address at about, with ‘SQL in Excel’ in the subject line, by 10:00 pm NZDT on Saturday 14 July. I’ll judge the best/most interesting of these and publish them in a follow-up post after the weekend, so it’s important that you’re ok with other people reading what you send me.

Regular Expressions – new version

Just a quick post tonight to let you all know, I’ve added a new page for downloads, which contains a new version of the regular expressions add-in, compatible with Excel 2007 and later. I’ve added in a couple of utility functions for convenience (mine more than yours but you might find them useful), and a form to evaluate regular expressions against test strings. And there’s even documentation!

The documentation gives a (very) brief intro to the uses and abuses of regular expressions, a run-down of the worksheet functions in the add-in and some examples of their use. Here are a couple of those, I hope you find them useful.

Matching cells which contain variations on a word

There are some words in the English language which Americans, god bless them, spell in their own special way. However, given input on the spreadsheet from users who spell both ways (correctly and incorrectly), you may wish to match both variations of words like ‘realise’ (‘realize’) and ‘colour’ (‘color’).
The pattern to match realise/realize is simple: \breali(s|z)e\b
The word boundary markers ensure we are looking at a complete word, and the alternation of (s|z) means that we match both versions.
Applying the ISRXMATCH formula demonstrates this is successful:

Validating Email Addresses

Given a list of email addresses in a column on a spreadsheet, we wish to ensure that these stick to a form which at least obeys some of the rules governing the format of email addresses. As these are going to be used by a script to send emails, we wish to minimise the number of undeliverable responses due to invalid addresses. The basic rules we specify for these addresses are as follows:
The username part of the address contains one or more alphanumeric characters, and possibly some additional special characters. This is followed by a single @ sign, followed by the domain name, which consists of one or more alphanumeric and special characters, ending with a dot followed by the top-level domain. This must contain only alphanumeric characters, and there must be between 2 and 6 of these. The address should be the entire content of the cell, so the beginning and ending anchors are used at the start and end of the pattern. Case is unimportant, so the case_sensitive flag is set to false.
The pattern is as follows: ^[a-z0-9_%.+-]+@[a-z0-9-.]+\.[a-z]{2,6}$
This is then used in ISRXMATCH – a valid email address according to our rules above will return true:

The second address in the list fails due to the whitespace in the username, whereas the fourth fails because the domain name does not include a top-level domain part of a dot followed by 2-6 letters.
I borrowed this regex from http://www.regular-expressions.info/email.html. As well as a couple of alternative regexes to cover some edge cases which the above doesn’t catch, this page also discusses why email addresses can be tricky, and why you shouldn’t go overboard trying to cover every exception.

Some new regular expression functions

There are a couple of SAS functions I have found to be quite useful, and that I miss when I’m working in Excel. Probably the more useful of the two is the SCAN function. You give SCAN a text string, an integer n and optionally some delimiters, and SCAN will give you back the nth ‘word’ in a string, where the ‘words’ are delimited either by the characters you supplied or by default delimiter characters. For instance,

data _null_;
word5 = SCAN("The/number(of+potential.interactions|among^a)large*group<of-people,is$proportional.to!the square/of!their&number.", 5);
put word5=;
run;

will print

word5=interactions

to the log. If n is negative, it will count backwards (from right to left). So

data _null_;
word7 = SCAN("The/number(of+potential.interactions|among^a)large*group<of-people,is$proportional.to!the square/of!their&number.",-7);
put word7=;
run;

will print

word7=proportional

There’s also the function COUNTW, which will also take a text string and some delimiters as arguments, and returns the number of words in the string.

data _null_;
count = COUNTW("Consider, for example, the question of tax policy.");
put count=;
run;

produces

count=8

Once again regular expressions turn out to be ideally suited to translating this functionality to Excel worksheet functions. I’ve included them in Regex.xla as RXSCAN and RXCOUNTW:

Public Function RXSCAN(ByVal stringval As String, ByVal n As Integer, _
                        Optional ByVal dlm As String = "") As Variant
' RXSCAN - Returns the nth word from stringval, where
' words are delimited by any non-word character by default,
' or a specific character set by optional string dlm.

Dim objRegex As VBScript_RegExp_55.RegExp
Dim colMatch As VBScript_RegExp_55.MatchCollection
Dim charlist As String
Dim char As String
Dim i As Integer

' Build pattern
If dlm = "" Then ' Just looking for A-Za-z0-9_
  charlist = "\w+"
Else ' Negate character class specified
  charlist = "[^"
  For i = 1 To Len(dlm)
    char = Mid$(dlm, i, 1)
    If InStr("[\^-", char) Then char = "\" & char ' escape
    charlist = charlist & char
  Next i
  charlist = charlist & "]+"
End If

' Initialise Regexp object, get match collection
Set objRegex = New VBScript_RegExp_55.RegExp
With objRegex
  .Pattern = charlist
  .Global = True
  Set colMatch = .Execute(stringval)
End With

If colMatch.Count = 0 Then ' No words detected
    RXSCAN = CVErr(xlErrNA)
Else
  If Abs(n) > colMatch.Count Or n = 0 Then
    RXSCAN = CVErr(xlErrNum)
  Else
    If n > 0 Then ' Read from left to right
      RXSCAN = colMatch(n - 1)
    Else ' Right to left
      RXSCAN = colMatch(colMatch.Count + n)
    End If
  End If
End If

End Function

Public Function RXCOUNTW(ByVal stringval As String, _
                        Optional ByVal dlm As String = "") As Variant
' RXCOUNTW - Returns the count of words from stringval, where
' words are delimited by any non-word character by default,
' or a specific character set by optional string dlm.
Dim objRegex As VBScript_RegExp_55.RegExp
Dim colMatch As VBScript_RegExp_55.MatchCollection
Dim charlist As String
Dim char As String
Dim i As Integer

' Build pattern
If dlm = "" Then ' Just looking for A-Za-z0-9_
  charlist = "\w+"
Else ' Negate character class specified
  charlist = "[^"
  For i = 1 To Len(dlm)
    char = Mid$(dlm, i, 1)
    If InStr("[\^-", char) Then char = "\" & char ' escape
    charlist = charlist & char
  Next i
  charlist = charlist & "]+"
End If

' Initialise Regexp object, get match collection
Set objRegex = New VBScript_RegExp_55.RegExp
With objRegex
  .Pattern = charlist
  .Global = True
  Set colMatch = .Execute(stringval)
End With

If colMatch.Count = 0 Then ' No words detected
  RXCOUNTW = CVErr(xlErrNA)
Else
  RXCOUNTW = colMatch.Count
End If

End Function

I’ve also moved the add-in file to a dropbox folder. You can now find Regex.xla at http://dl.dropbox.com/u/29927305/Regex.xla

Regex Functions updated

Taking a brief pause on the way to completing the currency converter, to revise the functions in the Excel regular expressions addin, Regex.xla. I’m really rather slack in leaving it this long – it was pointed out to me several months ago that the way I had passed error values back to the calling cell did not work as intended if the return value was not a Variant. For instance in RXGET the intent of

RXGET = CVErr(xlErrNA)

is to have Excel display #N/A in the cell. As it’s returning a String rather than a Variant however, it displays #VALUE! instead. So I’ve altered the return types to Variant in all the functions, to allow for more meaningful error values to be returned.

It was also pointed out more recently that using IsMissing to test for the presence of an optional parameter and provide a default, like so:

If IsMissing(start_num) Then start_num = 0

will not work, as IsMissing works on Variants only, rather than primitive data types. Actually it’s a better idea to specify default parameters anyway, then there’s no need for this logic. So that’s what I’ve done.

Thanks to Colin and mifrey for pointing out the above issues. I’ve replaced the add-in on fileden, and if you like the code changes can be inspected at the Regex functions page.

TableCell – Beta Testers needed

Last year I posted a review of the QueryCell Excel add-in from Oak Focus Software. A few weeks ago, the developer of QueryCell let me know about a new add-in he’s developed called TableCell, a simplified version of QueryCell. Actually it was several weeks ago, sorry Sam – I’ve been a lot slower getting to this than I hoped.

Anyway, TableCell requires beta testers to do beta testing, so if you would like to help out (and get yourself some free trial software) head on over to TableCell and take a look. Basically the idea is to allow users to read from and write to database tables via ODBC. I’ve had a play and it will allow you to do just that, but not a lot more – it’s not a developer tool, it’s more aimed at the user who needs to do simple monitoring and updating without in-depth knowledge of either SQL or the data model in the DB. I see the same strength in the UI that I see in QueryCell, i.e. it’s polished, intuitive and requires minimal effort to get used to. Connecting to a DB is very easy – the one thing I would have liked to see and didn’t was a list of available tables. I know that introspection adds on a bit of overhead but in my view given the audience I see this add-in aimed at it’s probably worth it. It also requires the user to enter the name of the table’s primary key to extract data.

That’s all for now – upcoming posts over the next month include a review of some open data sources, a new Python/Excel hookup, and a brief how-to on XML data sources in Excel, which I’ll be using to build a simple currency converter.

QueryCell – SQL in Excel

QueryCell is an Excel add-in from Oak Focus Software that gives Excel users the ability to use SQL in Excel. Version 1.4 was released this month and lead developer Sam Howley gave me the chance to take a test drive, so I downloaded it and tried it out.

Before I get into what I thought of it, I know that Dick Kusleika at Daily Dose of Excel reviewed QueryCell last year. A number of questions were asked in comments then around what QueryCell provides that isn’t already available in Excel. It’s a fair question – after all, SQL in Excel is not a new concept – either using VBA and ADO, or MS Query. Those are all great tools for querying data from and in Excel, after you get to know how to apply them. After spending some time using QueryCell, I think the standout difference can be summed up with two pairs of hyphenated words: single-click and user-friendly. From the worksheet to the SQL editor: one click. No need to select the data source, specify the driver or write a connection string. One click and QueryCell opens up, auto-detects all data regions in the active workbook(s) and has the SQL editor ready to go.

Also, QueryCell provides two features that really sold me: Table DDL and the SQL Insert Generator. The Table DDL window displays a CREATE TABLE statement for the selected table:

Checking the ‘Use Custom Table DDL’ checkbox allows you to edit the statement, which is handy: if a column in the table has a reserved word (like Date or Case) as a title, QueryCell will auto-assign a generated name to replace it in the DDL. This is necessary in order to allow the Firebird database that QueryCell uses at the backend to accept the data. Also, the data type assigned to the columns is automatically set to VARCHAR(512), unless the columns have suffixes which identify them as another type, so these may need to be changed.

The SQL Insert Generator window is available on right-clicking inside a table on the worksheet, and generates an INSERT statement for each row in the table:

The ‘View Generator SQL’ button allows you to see the select statement that’s used to generate the INSERTs, which can be modified and re-run if required. The statements can then be copied to the clipboard, or saved to a script file.

What is it that makes these features so useful? Both the Table DDL and the Insert Generator appeal to me as tools to assist with migrating data from Excel to a database, which is a battle I’ve had to fight on more than one occasion.

Some other cool features: both the SQL Editor in the main window and the Insert Generator feature a ‘Format SQL’ button, which is nice. Column names are auto-completed in the SQL editor – also very nice. There’s a test data generator, which is cool for getting development going without needing to have data in the right shape beforehand, and a well-developed SQL/QueryCell tutorial.

Overall impressions: I’m really taken with QueryCell. It delivers some great functionality which I see as hugely useful in the context of what I do with Excel day-to-day. The UI has a very polished and responsive feel, and makes good use of space. It’s also very intuitive and easy to use, which as I said earlier is one of the key differences that sell it to me over existing alternatives. It’s modestly priced at $45 US, and offers a fully-functional trial download which is good for 30 queries before expiry. Well worth giving a whirl, which I thoroughly recommend.

Special Offer: Sam has generously provided me with 4 free licences to give away. The first four to email me (address at About Me) with “QueryCell” in the subject line will get a licence key sent back to them – please note I won’t be able to reply straightaway, but you can expect a reply within 24 hours.

Edit: Thanks for all the emails, people – the four licences have now all gone to good homes. I’d encourage you to download the trial anyway and see what you think…

More Regex Functions

And finally, I’m back! I finally overcame the inertia yesterday and got back to work on the regular expressions add-in. The next goal is to include regular expressions equivalents for Excel’s built-in advanced filter and find/replace functions. So far I’ve concentrated on just the filter part, which I hope to have finished fairly soon. However as there’s a fair amount of code involved, I’ll split this into 2 or 3 posts.

Firstly, the advanced filter dialog. Not being a big fan of re-inventing the wheel I decided to build pretty much exactly the same form as you find Excel presenting you with for an ordinary advanced filter. Here’s what my version looks like:

 

All that this form needs to do is collect 3 key pieces of information to control the filter process:

  1. What ranges will the filter operate on?
  2. Will the source range (List Range) be filtered in place, or will the filtered results be copied to another place (Copy to)?
  3. Does the filtered list need to display unique records only?

Here’s the code behind the form. The public properties List, Criteria and Extract return the addresses for the ranges in (1) from the RefEdit controls, and the booleans CopyTo and UniqueOnly return the values for (2) and (3) respectively.

Option Explicit

Private mbOK As Boolean

'**********************************************************
' Form class public properties
'**********************************************************

' User clicked OK
Public Property Get OK() As Boolean
    OK = mbOK
End Property
'**********************************************************
' Ranges selected for filter
'**********************************************************
Public Property Get List() As String
' List range to be filtered
    List = refList.Value
End Property

Public Property Get Criteria() As String
' Range containing criteria for filter
    Criteria = refCriteria.Value
End Property

Public Property Get Extract() As String
' Range to copy filtered rows to
    Extract = refExtract.Value
End Property

'**********************************************************
' Properties modifying filter operation
'**********************************************************

' User selected Copy to range option button
Public Property Get CopyTo() As Boolean
    CopyTo = optCopy.Value
End Property

' User selected Unique values only checkbox
Public Property Get UniqueOnly() As Boolean
    UniqueOnly = chkUnique.Value
End Property

'**********************************************************
' Form controls
'**********************************************************
Private Sub cmdCancel_Click()
    mbOK = False
    Me.Hide
End Sub

Private Sub cmdOk_Click()
    mbOK = True
    Me.Hide
End Sub

Private Sub optCopy_Click()
    
    ' enable refExtract
    lblExtract.ForeColor = vbButtonText
    With refExtract
        .Enabled = True
        .BackColor = vbWindowBackground
    End With
    
End Sub

Private Sub optInPlace_Click()

    ' grey out and disable refExtract
    lblExtract.ForeColor = vbGrayText
    With refExtract
        .Enabled = False
        .BackColor = vbInactiveCaptionText
    End With
    
End Sub

Private Sub UserForm_Initialize()

    ' Set default values for List range and optInPlace
    refList.Value = Application.ActiveCell.CurrentRegion.Address
    optInPlace.Value = True

End Sub

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer)

    If CloseMode = vbFormControlMenu Then
        cmdCancel_Click
        Cancel = True
    End If

End Sub

And here’s the sub GetRanges, which ‘reads’ the form:

Private Sub GetRanges(ByRef rngSource As Excel.Range, _
                ByRef rngCriteria As Excel.Range, _
                ByRef rngExtract As Excel.Range, _
                ByRef bUnique As Boolean)
                
    Dim frmFilter As FRXFilter

    Set frmFilter = New FRXFilter
    frmFilter.Show
    
    If frmFilter.OK Then
        With frmFilter
            If .List <> "" Then _
                Set rngSource = Range(.List)
            If .Criteria <> "" Then _
                Set rngCriteria = Range(.Criteria)
            If .CopyTo Then _
                Set rngExtract = Range(.Extract)
            bUnique = .UniqueOnly
        End With
    End If
                
End Sub

I also want to include some error-handling to cope with bad range addresses, but apart from that it’s pretty much there. Next time I’ll post the sub at the next level up, calling GetRanges and a couple of other routines to perform the filtering.