More Regex Functions, part 2


Part 2 of 3

After building the form in Part 1, the next stage of coding the regex filter involves implementing the logic for deciding which rows of the list to be filtered will pass through the filter. Each row in the criteria range (rngCriteria) can be thought of as a test made up of a set of criteria. In order for a row in the list range (rngSource) to pass the test, all of these criteria must be met. If the list row passes the test for any row in rngCriteria, then it will be included in the eventual output.

Given a criteria range with m rows (excluding the header row) and n columns, this can be modelled in pseudo-SQL:

SELECT * FROM rngSource AS a
WHERE 
	(Crit[1,1](a) AND ... AND Crit[1,n](a))
	OR ...
	OR (Crit[m,1](a) AND ... AND Crit[m,n](a))

where Crit[i,j](a) is a boolean function representing the criterion in row i, column j.

For the built-in advanced filter, Excel constructs these criteria by parsing the text in the criteria range – a large range of tests are available, including the use of relational operators, wildcard and exact text matches. For an introduction to use of the advanced filter, see the tutorial at Contextures.

For our purposes, though, we’ll be allowing only regular expression pattern matches on the text in the list rows. While in theory it would be possible to mimic all the built-in advanced filter functionality and add regular expression tests into the mix, I’m not attempting to extend the advanced filter at this point – instead I’m creating an alternative to it.

In order to test the rows, I decided to implement a couple of classes to contain them – one for the rows in the list and extract ranges, and one which works slightly differently for the criteria range. In both classes, the core idea is to use the Scripting.Dictionary object to store key-value pairs, where the keys are provided by the header rows and the values by the text in each cell.

The first of these, CRowDict, is what I intend to use for the rows in the list and extract ranges:

Option Explicit

' CRowDict - wraps dictionary, holds key-value pairings
' for each row in the list

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) As Variant
    If mdicValues.Exists(sKey) Then
        Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    
    For colNum = 1 To rngHeaders.Cells.Count
        mdicValues.Add rngHeaders(1, colNum).Text, _
                        rngValues(1, colNum).Text
    Next colNum

End Sub

Public Sub Display()
    ' Convenience method for testing
    Dim vKey As Variant
    For Each vKey In mdicValues.Keys
        Debug.Print vKey, mdicValues.Item(vKey)
    Next vKey
End Sub

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

The idea here is to allow us to access the text values in each row by name rather than position – this allows us the flexibility to change the order and/or number of columns returned to the extract range.

The class CRXRowDict is very similar, but stores RegExp objects rather than text. It also includes a Test method, which operates on CRowDict instances and returns True if all of the required text values match the corresponding regex patterns:

Option Explicit

' CRXRowDict - wraps dictionary, holds key-value pairings
' for each row in the list, where values are RegExp objects

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRXRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) _
                            As VBScript_RegExp_55.RegExp
    If mdicValues.Exists(sKey) Then
        Set Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    Dim objRegex As VBScript_RegExp_55.RegExp
    
    ' create new regexp object for each cell in the row
    For colNum = 1 To rngHeaders.Cells.Count
        Set objRegex = New VBScript_RegExp_55.RegExp
        objRegex.Pattern = rngValues(1, colNum).Text
        mdicValues.Add rngHeaders(1, colNum).Text, objRegex
    Next colNum

End Sub

Public Function Test(ByRef clsRow As CRowDict) As Boolean
    ' Executes the test method for each regexp object
    ' against the corresponding item in clsRow
    
    Dim objRegex As VBScript_RegExp_55.RegExp
    Dim iKey As Integer
    Dim sKey As String
    Dim sTest As String
    Dim bResult As Boolean
    
    ' Assume true to begin with
    bResult = True
    
    For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey
    
    Test = bResult
    
End Function

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

In the final part of this series (which I hope to put out by next week) I’ll be putting this all together, showing how these classes are used to implement the filter function.

Advertisements

2 thoughts on “More Regex Functions, part 2

  1. Alex says:

    Hi Geoff,
    looks like the blog is not updated so much recently. Will you still follow up this topic and demonstrate how to use the RegEx filter?
    It would help me a lot and I am sure many others, too!
    Thanks and regards,
    Alex

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s