How do you SQL in Excel?

QueryCell is the Excel add-in from Oak Focus Software that brings SQL into Excel. I reviewed version 1.4 some time ago, and a few days ago lead developer Sam Howley told me about the release of version 2.0.

The new version is a complete rewrite and offers a noticeable improvement in speed, stability and responsiveness, in addition to support for 64-bit Excel. As always I’m impressed with the clean, simple look and feel of the interface, which slides out the editor when you need to use it and tucks it away when you’re done. You can manipulate data from inside the current workbook, or externally through an ODBC connection.

Sam has very kindly agreed once again to give away some free licences to Number Cruncher readers. All you have to do to get yourself one is to send me an email describing how you’ve used SQL in Excel. Could be bringing external data in through a query, or querying tables in the current workbook, could be using the Excel Data menu and MS Query to add in a connection, or scripting an ADO connection using VBA, or using QueryCell. Surprise me! Send me a description of what the information need was and how you went about resolving it.

Email me at the address at about, with ‘SQL in Excel’ in the subject line, by 10:00 pm NZDT on Saturday 14 July. I’ll judge the best/most interesting of these and publish them in a follow-up post after the weekend, so it’s important that you’re ok with other people reading what you send me.

Advertisements

Summing a random sample

A colleague came to me recently with a problem:
I have a bunch of numbers in a spreadsheet, and I need to take a random sample of 50 of them, and sum it. I can do the sample by random sorting, but I need to repeat this several times, so I’d like a function that I can repeat without having to muck around.

Well, sampling – I have some code which will do just that. After that, I think I should be able to manage summing the array.

Here’s the code for creating a random sample:

Private Sub SampleArray(ByRef avBigset As Variant, ByRef avSmallset As Variant)
    ' SampleArray populates a random sample avSmallset from an array avBigset
    ' without replacement (each element in avBigset is considered once only)
    Dim lRemainder As Long
    Dim lSize As Long
    Dim lOb As Long
    Dim lPickit As Long
        
    ' Make sure we're dealing with arrays...
    If Not IsArray(avBigset) Or Not IsArray(avSmallset) Then Exit Sub
    ' Initialise
    lRemainder = UBound(avBigset)
    lSize = UBound(avSmallset)
    Randomize 0
    
    Do While lSize > 0  ' Still some left to pick up
        lOb = lOb + 1
        If Rnd < lSize / lRemainder Then
            lPickit = lPickit + 1
            avSmallset(lPickit) = avBigset(lOb)
            lSize = lSize - 1
        End If
        lRemainder = lRemainder - 1
    Loop    ' Sample complete
    
End Sub

The issue is that it’s built to work with 1d arrays, so given a range as input in the function, I would need to push the values of the range into an array. I could do that with something simple like the below:

Private Function arr(ByRef rng As Range) As Variant
    ' Convert range to 1-D array
    Dim i As Long
    Dim j As Long
    Dim k As Long
    Dim rowCount As Long
    Dim colcount As Long
    Dim lim As Long
    Dim sArr() As String
    Dim vArr As Variant
    
    vArr = rng.Value2
    
    ' dimension array
    rowCount = UBound(vArr)
    colcount = UBound(vArr, 2)
    lim = rowCount * colcount
    ReDim sArr(1 To lim)
    k = 1
    
    ' populate
    For i = 1 To rowCount
        For j = 1 To colcount
            sArr(k) = vArr(i, j)
            k = k + 1
        Next j
    Next i
    
    arr = sArr
    
End Function

This will do the trick, but performance-wise it works out pretty horribly. The bigger the array, the longer this step will take, and it’s all useless, unnecessary overhead. The better approach is I think to revise the SampleArray function so it will work with a range rather than an array.

Actually this turns out to be very easy. The SampleArray code loops over an array, but with a range, we can just loop over the Cells collection. The rest of the code is pretty much identical, except that we’ll use a For Each…Next loop rather than Do While:

Private Sub SampleRange(ByRef rngBigset As Variant, ByRef avSample As Variant)
    ' SampleRange populates a random sample avSample from a range rngBigset
    ' without replacement (each element in rngBigset is considered once only)
    Dim lRemainder As Long
    Dim lSize As Long
    Dim lPickit As Long
    Dim rngCell As Excel.Range
    
    
    ' Initialise
    lRemainder = rngBigset.Cells.Count
    lSize = UBound(avSample)
    Randomize 0
    
    For Each rngCell In rngBigset.Cells
        If lSize <= 0 Then Exit For ' Sample complete
        If Rnd < lSize / lRemainder Then
            lPickit = lPickit + 1
            avSample(lPickit) = rngCell
            lSize = lSize - 1
        End If
        lRemainder = lRemainder - 1
    Next rngCell

End Sub

So is there a significant performance gain in doing it this way, taking out the array function? Unsurprisingly, yes there is. I created two functions, SAMPLESUM, which requires the conversion to an array beforehand:

Public Function SAMPLESUM(ByRef in_range As Range, ByVal sample_size As Long) As Variant
    ' Sum a random sample of size sample_size from in_range
    Dim sample_array() As Double
    Dim in_array As Variant
    Dim index As Long
    
    ' set up large and small arrays
    in_array = arr(in_range)
    ReDim sample_array(1 To WorksheetFunction.Min(UBound(in_array), sample_size))
    
    ' get sample
    SampleArray in_array, sample_array
    ' sum sample array
    SAMPLESUM = WorksheetFunction.Sum(sample_array)
    
End Function

and SAMPLESUM2, working directly with the range:

Public Function SAMPLESUM2(ByRef in_range As Range, ByVal sample_size As Long) As Variant
    ' Sum a random sample of size sample_size from in_range
    Dim sample_array() As Double
    Dim index As Long
    
    ' set up large and small arrays
    ReDim sample_array(1 To WorksheetFunction.Min(in_range.Cells.Count, sample_size))
    
    ' get sample
    SampleRange in_range, sample_array
    ' sum sample array
    SAMPLESUM2 = WorksheetFunction.Sum(sample_array)

End Function

I then set up a function which timed 100 calculations of each version, taking a sample of size 10 from a range of numbers with 50,000 rows and 10 columns:

Public Sub timeit()
Dim starttime As Double
Dim endtime As Double
Dim totaltime As Double
Dim i As Integer
Dim numrange As Excel.Range

Set numrange = Range("A1:J50000")


starttime = Timer

For i = 1 To 100
    SAMPLESUM numrange, 10
Next i

endtime = Timer
totaltime = endtime - starttime

Debug.Print "Time taken for 100 calculations (SAMPLESUM): " & totaltime & " seconds"

starttime = Timer

For i = 1 To 100
    SAMPLESUM2 numrange, 10
Next i

endtime = Timer
totaltime = endtime - starttime

Debug.Print "Time taken for 100 calculations (SAMPLESUM2): " & totaltime & " seconds"

End Sub

Which output:

Time taken for 100 calculations (SAMPLESUM): 44.140625 seconds
Time taken for 100 calculations (SAMPLESUM2): 12.546875 seconds

I think I’ll go with door number 2.

Regular Expressions – new version

Just a quick post tonight to let you all know, I’ve added a new page for downloads, which contains a new version of the regular expressions add-in, compatible with Excel 2007 and later. I’ve added in a couple of utility functions for convenience (mine more than yours but you might find them useful), and a form to evaluate regular expressions against test strings. And there’s even documentation!

The documentation gives a (very) brief intro to the uses and abuses of regular expressions, a run-down of the worksheet functions in the add-in and some examples of their use. Here are a couple of those, I hope you find them useful.

Matching cells which contain variations on a word

There are some words in the English language which Americans, god bless them, spell in their own special way. However, given input on the spreadsheet from users who spell both ways (correctly and incorrectly), you may wish to match both variations of words like ‘realise’ (‘realize’) and ‘colour’ (‘color’).
The pattern to match realise/realize is simple: \breali(s|z)e\b
The word boundary markers ensure we are looking at a complete word, and the alternation of (s|z) means that we match both versions.
Applying the ISRXMATCH formula demonstrates this is successful:

Validating Email Addresses

Given a list of email addresses in a column on a spreadsheet, we wish to ensure that these stick to a form which at least obeys some of the rules governing the format of email addresses. As these are going to be used by a script to send emails, we wish to minimise the number of undeliverable responses due to invalid addresses. The basic rules we specify for these addresses are as follows:
The username part of the address contains one or more alphanumeric characters, and possibly some additional special characters. This is followed by a single @ sign, followed by the domain name, which consists of one or more alphanumeric and special characters, ending with a dot followed by the top-level domain. This must contain only alphanumeric characters, and there must be between 2 and 6 of these. The address should be the entire content of the cell, so the beginning and ending anchors are used at the start and end of the pattern. Case is unimportant, so the case_sensitive flag is set to false.
The pattern is as follows: ^[a-z0-9_%.+-]+@[a-z0-9-.]+\.[a-z]{2,6}$
This is then used in ISRXMATCH – a valid email address according to our rules above will return true:

The second address in the list fails due to the whitespace in the username, whereas the fourth fails because the domain name does not include a top-level domain part of a dot followed by 2-6 letters.
I borrowed this regex from http://www.regular-expressions.info/email.html. As well as a couple of alternative regexes to cover some edge cases which the above doesn’t catch, this page also discusses why email addresses can be tricky, and why you shouldn’t go overboard trying to cover every exception.

The Five Reasons Google Docs Is Not Ready To Replace Excel

Today’s post is from guest contributor Olivia Lennox.

Not too long after the initial excitement of having an application that looks exactly as a spreadsheet is meant to open up in seconds within your browser, Google Docs starts to let you down. Everything from performance on large sheets to the presentation of data is substandard when compared to even the Excel that ships with the 2003 version of Microsoft Office.

Here are the five main areas Google need to tackle before power spreadsheet users can even think about moving from Excel.

1. Graphical Presentation Of Data

One of the first tasks we attempted to replicate was to store and present some stock market performance data from an online share dealing account. To be fair to Google Docs Spreadsheet getting used to entering data was simple and the chart was quickly on it’s own sheet after clicking the add chart button and choosing a few options.

Here was where the vast power of Excel that we take for granted was obviously lacking. How about adding trend lines to scatter charts? This seems currently to be an either/or only feature in Google Docs which is disappointing. In fact only a limited amount of customization is possible at all in the Graphing function. The end result of this test was a graph with the correct headings, colours and lines on Excel and a poor compromise of one line, and unwanted colour and style in Google Docs.

With Excel hardly at the zenith of the data presentation world (some apps like Jaspersoft are near this level) it’s not really possible to consider Google Docs a suitable contender while it rests a long, long way behind Excel.

2. Load And Usage Performance For Large Sheets

Many organisations use Excel to manage very large amounts of data and do modelling on that data using some pretty complicated formulas. I’d say being able to handle these sheets with ease on a mid-range PC typical of many office settings should be a very high priority.

Unfortunately Google Docs significantly disappoints. In some simple tests with opening one large sheet, amending some fields and waiting for the recalculation Excel actually finished before we were able to enter the field on Google. This was repeated on five separate mid range machines with above average Internet connections and two gigabytes of RAM.

3. The Range Of Data Analysis And Presentation Tools In Excel Blows Google Away

Google does have the basics right here. Sorting data and pivot tables worked well and relatively quickly (although on the machines we tested Excel was always faster as noted above) and was easy to figure out for an experienced Spreadsheet user.

The Data Analysis TookPak in Excel however is just so powerful it’ll be a long time before any Spreadsheet can match it – let alone an online-only solution. Many challenging University level and beyond projects can be completed using these tools as most statistical functions are supported. A good guide can be found here which also uncovers some of the pitfalls of using a spreadsheet for some of these tasks.

4. Change In Scripting Language

Whilst this is not necessarily a fair criticism of Google Docs they have chosen to use Javascript as their scripting language whereas Microsoft uses VBA (Visual Basic for Applications). Not only are more casual users familiar with BASIC from their education – it takes school children who’ve learned BASIC just a few hours to get scripting in Excel at a simple level but existing spreadsheets are now significantly more difficult to port to Google Docs. Any corporation with a range of large sheets containing scripts needs to think very carefully about a move.

5. Security Concerns

“Security consultant Ade Barkah checked in with us to alert us to a couple of serious security issues associated to Google Docs, the web-based office software from the world’s most famous search engine company, giving a whole new meaning to its mission to make the world’s information universally accessible.”
TechCrunch 2009

Throughout the years since it’s launched there have been a number of so far minor security issues with Google Docs. However minor these have been and however quickly Google has fixed them corporations used to controlling their own security will be concerned with these issues. Microsoft hasn’t always had a perfect reputation for security but to be fair to them spreadsheets on your own private server haven’t been busy sharing a few graphs of private information with the World Wide Web.

Combining chart types

Wow, it’s been a long time. This is a post I had planned for some time ago but didn’t get around to until now – as many All Black supporters might say, about bloody time

In a made-up scenario I have two rates to report on, Metric 1 and Metric 2. An indicator of good performance is that Metric 2 sits 5-10% above Metric 1. So when I plot the monthly data, as well as the two Metrics time series I’ll add in another series called Difference, that will be calculated as Metric 2 – Metric 1. I’ll also add in the constant series Target 1 (5%) and Target 2 (10%), and what I want to see is the Difference series sitting between these two targets. Here’s what that looks like:

Now, I don’t know what you think of that, but to me it’s confusing. The difference between Metric 1 and 2 and where it sits in relation to the 5% and 10% marks is the key indicator I need to follow, but both Metric 1 and 2 are also important to provide context. Putting them all together on the same line chart makes it harder than it needs to be to see what the indicator is doing.

Thankfully, there is a much simpler way to present this, which avoids the need for the eye to track all five time series together on the chart, and also manages to signal much more clearly where the key indicator is in relation to its target. To do this I can use the relatively simple expedient of combining different chart types.

First, I right-click the Metric 1 time series, select ‘Change Series Chart Type…’ and pick out the ‘Stacked Area’ chart type:

I also format the fill of the area series to a red colour:

Next, I make the same change of chart type to the Target 1 time series. I change the colour to orange (or amber if you prefer), and the name of the series to ‘0-5% above’:

I remove both the Difference and the Target 2 time series from the chart, and add the Target 1 series in again, by copying the data range from the worksheet, selecting the chart and pasting. Once that’s done, I again change the chart type of the new series to stacked area, set the fill colour to green and rename it to ‘5-10% above’. I also change the line colour of the Metric 2 series to a dark blue to provide a little more contrast against the area charts:

And that’s done. We now don’t need to show the ‘Difference’ series at all, as the magnitude of the difference between Metric 1 and 2 is clear without it. I also find the chart a lot simpler and more visually appealing. I should note in passing that I very rarely use the stacked area chart type, but it makes sense to me in this context.

Combining chart types like this is a simple and effective but under-used method – I guess that’s one side effect of the chart wizard, people assume that once the chart type is selected, that’s it. I first came across the idea at Jon Peltier’s site (this page has a lot of cool examples), it’s been a handy tool ever since.

An Undo Stack in VBA

I’m working on an Excel add-in which will allow a user to perform operations on cells. I’d also like to allow the user to ‘undo’ these operations. Excel itself, of course, does have an Undo button (shortcut Ctrl + Z), but this works only to undo changes made in the user interface, and is not going to help for changes made by VBA. This means the add-in will need to remember each operation performed and have them ready to supply for un-doing should the user want to do so.

Many people have created their own general solutions for undoing changes made in VBA – for instance, here’s one from Jan Karel Pieterse (who also created the hugely useful Name Manager add-in): Undo With Excel VBA. However I decided not to use a solution like this – it’s quite a lot of code to add to a project, and it is very general.

In this case I’m happy to write a more specific solution, which copes with undoing a specific action, by simply delegating the work of undoing the action to the object which did the action in the first place. Here’s an example:

' CellTest Class - allows calling code to change the value of a
' cell and then change it back

'*****************************************************************
' Private Variables
'*****************************************************************
Private mCell As Range
Private vOrigFormula As Variant

'*****************************************************************
' Public Properties
'*****************************************************************
Public Property Get Address() As String
  Address = mCell.Address
End Property

Public Property Set Cell(ByRef rCell As Range)
  Set mCell = rCell
  vOrigFormula = rCell.Formula
End Property

'*****************************************************************
' Public Methods
'*****************************************************************
Public Sub Change(ByVal sText As String)
  mCell.Value = sText
End Sub

Public Sub ChangeBack()
  mCell.Formula = vOrigFormula
End Sub

So what about storing up a list of operations for undoing? This is ideally suited to a stack. Stacks are neat data structures – they don’t do much, but they do it well and they’re really easy to code. Here’s one I picked up from the VBA Developer’s Handbook and tweaked a little to suit. There are two classes, StackItem and Stack. StackItem simply stores a value and a reference to another StackItem object – the next item on the stack.

' StackItem class holds a reference to the object it refers to and
' a reference to the next item in the stack

Public Value As Variant
Public NextItem As StackItem

Stack stores a reference to the item on the top of the stack, and exposes the functions Push (to push a new item on to the top) and Pop (to remove the top item and return a reference to it). Properties IsEmpty and Peek tell you if there are any items in the stack, and what the value of the top item is (without removing it), respectively. There’s a very nice visual explanation of how this looks here.

' Stack holds a reference to the top item in the stack data structure
' and manages the adding and removing of stack items

Private siTop As StackItem

'*****************************************************************
' Public Properties
'*****************************************************************
Public Property Get IsEmpty() As Boolean
    IsEmpty = siTop Is Nothing
End Property

Public Property Get Peek() As Variant
    If IsObject(siTop.Value) Then
        Set Peek = siTop.Value
    Else
        Peek = siTop.Value
    End If
End Property

'*****************************************************************
' Public Methods
'*****************************************************************
Public Function Push(ByRef varIn As Variant) As Boolean

    Dim siNew As StackItem

On Error GoTo PushError

    Set siNew = New StackItem
    If IsObject(varIn) Then
        Set siNew.Value = varIn
    Else
        siNew.Value = varIn
    End If
    Set siNew.NextItem = siTop
    Set siTop = siNew
    
    Push = True
    
Exit Function

PushError:
    Push = False
End Function

Public Function Pop() As Variant

    If IsObject(siTop.Value) Then
        Set Pop = siTop.Value
    Else
        Pop = siTop.Value
    End If
    Set siTop = siTop.NextItem

End Function

So now to store an actions for undoing, I just need to push the object which performed it onto the undo stack. Every time I wish to actually undo an action, I just pop the object reference off, and instruct it to undo.

Next time, I’ll demonstrate how this looks in practice.

Why Corporate Excel Training Sucks For Everyone

Today’s post comes from Sean Duffy, CEO at Excel Everest.

http://www.ExcelEverest.com | Corporations need to rethink the way they train their employees in Microsoft Excel.

Corporate Excel trainings, if they are offered at all, are usually three hour classes in which 20 people gather around a projector and more or less watch an instructor go through various features of Excel. Occasionally, students are asked to join along and type in a formula or two, but on the whole, it’s usually a teaching exercise, not a learning exercise.

At Excel Everest, we’ve heard this numerous times from numerous people, and we built our whole business around the premise that firstly, those who have a strong grounding of Excel inside organizations tend to be more successful, and secondly, that the current method of training employees is antique.

After having gone through a number of Excel trainings ourselves, we don’t think people are *actually* learning from the corporate Excel classes. Along with countless users we’ve talked to, a recent bit of feedback from an Excel Everest customer lays this out pretty clearly:

I have attended numerous one and two day programs “teaching” excel. I found that once the program is over, I am no better than I was before. For me most programs show us all the things Excel can do but they don’t really teach you Excel. Your program is the first that I feel that I am learning it such that I can use it. I also like that I can repeat and review any particular lesson any time. If you don’t do/use it you lose it. – Kimber

Kimber, like many others, has been there… she’s attended classes that have no relevance, no real world scenarios, and no accountability. Plus, our guess is that it’d be difficult to describe any of them as “fun.”

No only is this a losing situation for people like Kimber, but from the perspective of the company who’s paying for trainings, it’s also a risky proposition. There’s no way to know that employees are actually learning from the trainings, given that there’s no accountability. Return on investment simply cannot be shown.

We’ve thought deeply how to fix this problem of ROI as well as the problem of employees, well, not learning Excel, and we built Excel Everest, a complete training course in Excel that’s built entirely into an Excel file. Imagine a huge workbook that teaches employees Excel, challenges them with exercises and grades you automatically in a comprehensive scoreboard. We’ve built Excel Everest to be a self-service and fun learning tool. An employee can complete the tutorial on her own time.

From the employers standpoint, this makes perfect sense as well. If an employee completes Excel Everest, she can simply send the completed document over to her manager, and her manager can quickly see that she has learned the material. It’s as easy as that. The employer knows that the money they spent on Excel Everest is paying off.

If you’re interested in using Excel Everest inside your organization, say hello at http://www.ExcelEverest.com/Contact.aspx

// Excel Everest (http://www.ExcelEverest.com) is a complete Excel training course built inside an Excel file. The goal of Excel Everest is to provide an immersive, interactive, learning Experience for employees, all while demonstrating a return on investment to employers. Excel Everest is currently being used in organizations such as Google, Hymans Roberston, and PlayCore.