Dictionary vs Collection

A pretty common requirement in programming is to be able to create a group of related objects, and put them into some sort of container. This means they can effectively be treated as one object, while still being accessible individually. For example, if I’m tracking the stats of a sports team over the course of a game, I could create a bunch of Player objects, all of which have individual attributes, but which I can view collectively as a Team object.

One way to group the Players would be to put them into a Team array. A weakness of this approach is that the only way to access one of the Players is to use their position in the array. This is ok for arrays where the positions of the members aren’t going to change, but if this isn’t the case it gets harder to keep track of things the longer the array persists in memory and the more changes need to be made to it. What is useful in this case is to be able to retrieve a Player object by using a key that uniquely identifies them, such as their name for instance, and not have to worry about which position in the Team array they are in.

The VB Collection object allows you to do this, by giving you the option of defining a ‘key’ value when the item is added to it – for instance, the below adds the range object identified by the name RangeName to a collection, and uses RangeName as the key:

Dim col As Collection
Dim rng As Excel.Range
Dim sRangeName As String
Set col = New Collection
sRangeName = "RangeName"
Set rng = Range(sRangeName)
col.Add rng, sRangeName

Now, when you wish to retrieve a particular item from the collection, you have the option of either specifying its position (index), or using the key value. If you know this was the first item added and its position hasn’t changed, you can use

col.Item(1)

(VB collections are 1-based). Otherwise, you can use

col.Item(sRangeName)

You can also iterate over the collection using For Each…Next syntax, which is both more concise and faster than accessing an array one item at a time.

Collections are fine – and pretty much all the container structures in the Excel object model use them, which is why you can say

For Each wb in Workbooks

or

For Each cell in rng.Cells

However, I hardly ever use them for implementing my own containers – where key-based lookup is required I prefer the VBScript Dictionary. The syntax for adding objects to a dictionary is very similar to adding items to a collection, apart from the reversal of the key and item arguments:

Dim dict As Scripting.Dictionary
Dim rng As Excel.Range
Dim sRangeName As String
Set dict = New Scripting.Dictionary
sRangeName = "RangeName"
Set rng = Range(sRangeName)
dict.Add sRangeName, rng

as is the retrieval of items:

dict.Item(sRangeName)

except for the fact that you have to use the key (which is safer in any case).

What really sells the dictionary to me over the collection are the following 3 methods:

dict.Exists(k) – boolean to let you know whether or not the key k has been added to the dictionary. This is very useful, as it avoids the need for the clunky idiom you’re forced to use to check the same thing with collections:

Public Function Exists(ByRef col As Collection, _
                    ByVal sKey As String) As Boolean

    Dim lCheck as Long
    On Error Resume Next
    lCheck = VarType(col.Item(sKey))
    If Err.Number = 0 Then
        Exists = True
    Else
        Exists = False
    End If

End Function

dict.Keys – Return an array containing all keys in the dictionary, which can then be iterated over. This for me is where the collection really falls down, as while it’s certainly possible to iterate over the items in a collection, the key cannot be returned from the item – which means it’s not available. A good example of why you might want this is the following from the CRXRowDict.Test function in the last post:

For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey

The key is required in two places – once to retrieve the matching regexp object from the dictionary and once to retrieve the test string from the CRowDict instance. The only way I could see to achieve the same result with a collection would be to populate and iterate over a separate array – not pretty.

dict.RemoveAll – As the name suggests, clean out all key-item pairs in the dictionary. No need to loop through either keys or items here, quick and easy.

More Regex Functions, part 2

Part 2 of 3

After building the form in Part 1, the next stage of coding the regex filter involves implementing the logic for deciding which rows of the list to be filtered will pass through the filter. Each row in the criteria range (rngCriteria) can be thought of as a test made up of a set of criteria. In order for a row in the list range (rngSource) to pass the test, all of these criteria must be met. If the list row passes the test for any row in rngCriteria, then it will be included in the eventual output.

Given a criteria range with m rows (excluding the header row) and n columns, this can be modelled in pseudo-SQL:

SELECT * FROM rngSource AS a
WHERE 
	(Crit[1,1](a) AND ... AND Crit[1,n](a))
	OR ...
	OR (Crit[m,1](a) AND ... AND Crit[m,n](a))

where Crit[i,j](a) is a boolean function representing the criterion in row i, column j.

For the built-in advanced filter, Excel constructs these criteria by parsing the text in the criteria range – a large range of tests are available, including the use of relational operators, wildcard and exact text matches. For an introduction to use of the advanced filter, see the tutorial at Contextures.

For our purposes, though, we’ll be allowing only regular expression pattern matches on the text in the list rows. While in theory it would be possible to mimic all the built-in advanced filter functionality and add regular expression tests into the mix, I’m not attempting to extend the advanced filter at this point – instead I’m creating an alternative to it.

In order to test the rows, I decided to implement a couple of classes to contain them – one for the rows in the list and extract ranges, and one which works slightly differently for the criteria range. In both classes, the core idea is to use the Scripting.Dictionary object to store key-value pairs, where the keys are provided by the header rows and the values by the text in each cell.

The first of these, CRowDict, is what I intend to use for the rows in the list and extract ranges:

Option Explicit

' CRowDict - wraps dictionary, holds key-value pairings
' for each row in the list

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) As Variant
    If mdicValues.Exists(sKey) Then
        Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    
    For colNum = 1 To rngHeaders.Cells.Count
        mdicValues.Add rngHeaders(1, colNum).Text, _
                        rngValues(1, colNum).Text
    Next colNum

End Sub

Public Sub Display()
    ' Convenience method for testing
    Dim vKey As Variant
    For Each vKey In mdicValues.Keys
        Debug.Print vKey, mdicValues.Item(vKey)
    Next vKey
End Sub

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

The idea here is to allow us to access the text values in each row by name rather than position – this allows us the flexibility to change the order and/or number of columns returned to the extract range.

The class CRXRowDict is very similar, but stores RegExp objects rather than text. It also includes a Test method, which operates on CRowDict instances and returns True if all of the required text values match the corresponding regex patterns:

Option Explicit

' CRXRowDict - wraps dictionary, holds key-value pairings
' for each row in the list, where values are RegExp objects

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRXRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) _
                            As VBScript_RegExp_55.RegExp
    If mdicValues.Exists(sKey) Then
        Set Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    Dim objRegex As VBScript_RegExp_55.RegExp
    
    ' create new regexp object for each cell in the row
    For colNum = 1 To rngHeaders.Cells.Count
        Set objRegex = New VBScript_RegExp_55.RegExp
        objRegex.Pattern = rngValues(1, colNum).Text
        mdicValues.Add rngHeaders(1, colNum).Text, objRegex
    Next colNum

End Sub

Public Function Test(ByRef clsRow As CRowDict) As Boolean
    ' Executes the test method for each regexp object
    ' against the corresponding item in clsRow
    
    Dim objRegex As VBScript_RegExp_55.RegExp
    Dim iKey As Integer
    Dim sKey As String
    Dim sTest As String
    Dim bResult As Boolean
    
    ' Assume true to begin with
    bResult = True
    
    For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey
    
    Test = bResult
    
End Function

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

In the final part of this series (which I hope to put out by next week) I’ll be putting this all together, showing how these classes are used to implement the filter function.