Feeds:
Posts
Comments

A pretty common requirement in programming is to be able to create a group of related objects, and put them into some sort of container. This means they can effectively be treated as one object, while still being accessible individually. For example, if I’m tracking the stats of a sports team over the course of a game, I could create a bunch of Player objects, all of which have individual attributes, but which I can view collectively as a Team object.

One way to group the Players would be to put them into a Team array. A weakness of this approach is that the only way to access one of the Players is to use their position in the array. This is ok for arrays where the positions of the members aren’t going to change, but if this isn’t the case it gets harder to keep track of things the longer the array persists in memory and the more changes need to be made to it. What is useful in this case is to be able to retrieve a Player object by using a key that uniquely identifies them, such as their name for instance, and not have to worry about which position in the Team array they are in.

The VB Collection object allows you to do this, by giving you the option of defining a ‘key’ value when the item is added to it – for instance, the below adds the range object identified by the name RangeName to a collection, and uses RangeName as the key:

Dim col As Collection
Dim rng As Excel.Range
Dim sRangeName As String
Set col = New Collection
sRangeName = "RangeName"
Set rng = Range(sRangeName)
col.Add rng, sRangeName

Now, when you wish to retrieve a particular item from the collection, you have the option of either specifying its position (index), or using the key value. If you know this was the first item added and its position hasn’t changed, you can use

col.Item(1)

(VB collections are 1-based). Otherwise, you can use

col.Item(sRangeName)

You can also iterate over the collection using For Each…Next syntax, which is both more concise and faster than accessing an array one item at a time.

Collections are fine – and pretty much all the container structures in the Excel object model use them, which is why you can say

For Each wb in Workbooks

or

For Each cell in rng.Cells

However, I hardly ever use them for implementing my own containers – where key-based lookup is required I prefer the VBScript Dictionary. The syntax for adding objects to a dictionary is very similar to adding items to a collection, apart from the reversal of the key and item arguments:

Dim dict As Scripting.Dictionary
Dim rng As Excel.Range
Dim sRangeName As String
Set dict = New Scripting.Dictionary
sRangeName = "RangeName"
Set rng = Range(sRangeName)
dict.Add sRangeName, rng

as is the retrieval of items:

dict.Item(sRangeName)

except for the fact that you have to use the key (which is safer in any case).

What really sells the dictionary to me over the collection are the following 3 methods:

dict.Exists(k) – boolean to let you know whether or not the key k has been added to the dictionary. This is very useful, as it avoids the need for the clunky idiom you’re forced to use to check the same thing with collections:

Public Function Exists(ByRef col As Collection, _
                    ByVal sKey As String) As Boolean

    Dim lCheck as Long
    On Error Resume Next
    lCheck = VarType(col.Item(sKey))
    If Err.Number = 0 Then
        Exists = True
    Else
        Exists = False
    End If

End Function

dict.Keys – Return an array containing all keys in the dictionary, which can then be iterated over. This for me is where the collection really falls down, as while it’s certainly possible to iterate over the items in a collection, the key cannot be returned from the item – which means it’s not available. A good example of why you might want this is the following from the CRXRowDict.Test function in the last post:

For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey

The key is required in two places – once to retrieve the matching regexp object from the dictionary and once to retrieve the test string from the CRowDict instance. The only way I could see to achieve the same result with a collection would be to populate and iterate over a separate array – not pretty.

dict.RemoveAll – As the name suggests, clean out all key-item pairs in the dictionary. No need to loop through either keys or items here, quick and easy.

Part 2 of 3

After building the form in Part 1, the next stage of coding the regex filter involves implementing the logic for deciding which rows of the list to be filtered will pass through the filter. Each row in the criteria range (rngCriteria) can be thought of as a test made up of a set of criteria. In order for a row in the list range (rngSource) to pass the test, all of these criteria must be met. If the list row passes the test for any row in rngCriteria, then it will be included in the eventual output.

Given a criteria range with m rows (excluding the header row) and n columns, this can be modelled in pseudo-SQL:

SELECT * FROM rngSource AS a
WHERE
	(Crit[1,1](a) AND ... AND Crit[1,n](a))
	OR ...
	OR (Crit[m,1](a) AND ... AND Crit[m,n](a))

where Crit[i,j](a) is a boolean function representing the criterion in row i, column j.

For the built-in advanced filter, Excel constructs these criteria by parsing the text in the criteria range – a large range of tests are available, including the use of relational operators, wildcard and exact text matches. For an introduction to use of the advanced filter, see the tutorial at Contextures.

For our purposes, though, we’ll be allowing only regular expression pattern matches on the text in the list rows. While in theory it would be possible to mimic all the built-in advanced filter functionality and add regular expression tests into the mix, I’m not attempting to extend the advanced filter at this point – instead I’m creating an alternative to it.

In order to test the rows, I decided to implement a couple of classes to contain them – one for the rows in the list and extract ranges, and one which works slightly differently for the criteria range. In both classes, the core idea is to use the Scripting.Dictionary object to store key-value pairs, where the keys are provided by the header rows and the values by the text in each cell.

The first of these, CRowDict, is what I intend to use for the rows in the list and extract ranges:

Option Explicit

' CRowDict - wraps dictionary, holds key-value pairings
' for each row in the list

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) As Variant
    If mdicValues.Exists(sKey) Then
        Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues

    Dim colNum As Long

    For colNum = 1 To rngHeaders.Cells.Count
        mdicValues.Add rngHeaders(1, colNum).Text, _
                        rngValues(1, colNum).Text
    Next colNum

End Sub

Public Sub Display()
    ' Convenience method for testing
    Dim vKey As Variant
    For Each vKey In mdicValues.Keys
        Debug.Print vKey, mdicValues.Item(vKey)
    Next vKey
End Sub

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

The idea here is to allow us to access the text values in each row by name rather than position – this allows us the flexibility to change the order and/or number of columns returned to the extract range.

The class CRXRowDict is very similar, but stores RegExp objects rather than text. It also includes a Test method, which operates on CRowDict instances and returns True if all of the required text values match the corresponding regex patterns:

Option Explicit

' CRXRowDict - wraps dictionary, holds key-value pairings
' for each row in the list, where values are RegExp objects

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRXRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) _
                            As VBScript_RegExp_55.RegExp
    If mdicValues.Exists(sKey) Then
        Set Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues

    Dim colNum As Long
    Dim objRegex As VBScript_RegExp_55.RegExp

    ' create new regexp object for each cell in the row
    For colNum = 1 To rngHeaders.Cells.Count
        Set objRegex = New VBScript_RegExp_55.RegExp
        objRegex.Pattern = rngValues(1, colNum).Text
        mdicValues.Add rngHeaders(1, colNum).Text, objRegex
    Next colNum

End Sub

Public Function Test(ByRef clsRow As CRowDict) As Boolean
    ' Executes the test method for each regexp object
    ' against the corresponding item in clsRow

    Dim objRegex As VBScript_RegExp_55.RegExp
    Dim iKey As Integer
    Dim sKey As String
    Dim sTest As String
    Dim bResult As Boolean

    ' Assume true to begin with
    bResult = True

    For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey

    Test = bResult

End Function

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

In the final part of this series (which I hope to put out by next week) I’ll be putting this all together, showing how these classes are used to implement the filter function.

Project Euler #18 and #67

I’m hoping to get back to work on the regular expressions filter soon – should have some time over the Christmas/New Year break – but in the meantime, I took a look at the Euler problems related to number triangles. Both problems ask for the maximal path through adjacent points in the triangle – from problem 18:

By starting at the top of the triangle below and moving to adjacent numbers on the row below, the maximum total from top to bottom is 23.

3
7 4
2 4 6
8 5 9 3

That is, 3 + 7 + 4 + 9 = 23.

Find the maximum total from top to bottom of the triangle below:

75
95 64
17 47 82
18 35 87 10
20 04 82 47 65
19 01 23 75 03 34
88 02 77 73 07 63 67
99 65 04 28 06 16 70 92
41 41 26 56 83 40 80 70 33
41 48 72 33 47 32 37 16 94 29
53 71 44 65 25 43 91 52 97 51 14
70 11 33 28 77 73 17 78 39 68 17 57
91 71 52 38 17 14 91 43 58 50 27 29 48
63 66 04 68 89 53 67 30 73 16 69 87 40 31
04 62 98 27 23 09 70 98 73 93 38 53 60 04 23

It is of course possible to work out the sum of every single path, by starting at the top and trying every combination of left and right choices to get to the bottom, but for a triangle with n rows this works out at 2n-1 sums to calculate. This makes the brute-force approach impractical for the larger triangle in problem 67 (100 rows); it also involves a whole lot of unnecessary calculation.

The key to seeing this is understanding that (1) every point in the triangle above the bottom row is itself the apex of a sub-triangle, that has its own maximal path, and (2) where any sub-triangles intersect, this intersection contains another sub-triangle – so working out all the paths through the two intersecting triangles involves working out every path through the intersection twice. For example, in the triangle

3
7 4
2 4 6
8 5 9 3

the two sub-triangles with apexes at 7 and 4 both contain the sub-triangle

4
5 9

with two possible paths. So calculating every possible path through the larger triangle involves traversing both these paths twice, once for each sub-triangle containing them. It’s easy to see that this gets worse quickly as the triangle gets larger.

A smarter approach is what might be termed a ‘bottom-up’ method. Working from the base upwards, calculate the maximal path for every point in each row. Once you have the maxima for two adjacent points in a row, the maximum for the point above both these points is simply the sum of this point and the maximum of these two values. The advantage of this method is that the maximum for any point in the triangle is calculated once and only once. Here’s how this looks in Python.

First, a helper function get_triangle to read the string in and get a list of rows of integers making up the triangle:

def get_triangle(triangle_string):
    string_list=triangle_string.strip().split('\n')
    #reverse to work from the base up
    string_list.reverse()
    return [[int(x) for x in row.split()] \
        for row in string_list]

Then, the main routine max_path works through each row, progressively replacing the list of maxima at each point in the previous row with the same for the current row:

def max_path(triangle_string):
    triangle = get_triangle(triangle_string)
    #every point in the base is its own maximum
    max_list = triangle[0]
    for row in range(1, len(triangle)):
        #add the greater of the two adjacent maxima to
        #each point in the row
        max_list = \
            [triangle[row][x] + max(max_list[x], max_list[x+1]) \
             for x in range(len(triangle[row]))]
    return max_list[0]

That’s all for now – time permitting I will be posting the next part of the regular expression filter sometime in the next couple of days. Until then, hoping you all had a restful and happy Christmas and are taking some time off to be with family. Take care…

More Regex Functions

And finally, I’m back! I finally overcame the inertia yesterday and got back to work on the regular expressions add-in. The next goal is to include regular expressions equivalents for Excel’s built-in advanced filter and find/replace functions. So far I’ve concentrated on just the filter part, which I hope to have finished fairly soon. However as there’s a fair amount of code involved, I’ll split this into 2 or 3 posts.

Firstly, the advanced filter dialog. Not being a big fan of re-inventing the wheel I decided to build pretty much exactly the same form as you find Excel presenting you with for an ordinary advanced filter. Here’s what my version looks like:

 

All that this form needs to do is collect 3 key pieces of information to control the filter process:

  1. What ranges will the filter operate on?
  2. Will the source range (List Range) be filtered in place, or will the filtered results be copied to another place (Copy to)?
  3. Does the filtered list need to display unique records only?

Here’s the code behind the form. The public properties List, Criteria and Extract return the addresses for the ranges in (1) from the RefEdit controls, and the booleans CopyTo and UniqueOnly return the values for (2) and (3) respectively.

Option Explicit

Private mbOK As Boolean

'**********************************************************
' Form class public properties
'**********************************************************

' User clicked OK
Public Property Get OK() As Boolean
    OK = mbOK
End Property
'**********************************************************
' Ranges selected for filter
'**********************************************************
Public Property Get List() As String
' List range to be filtered
    List = refList.Value
End Property

Public Property Get Criteria() As String
' Range containing criteria for filter
    Criteria = refCriteria.Value
End Property

Public Property Get Extract() As String
' Range to copy filtered rows to
    Extract = refExtract.Value
End Property

'**********************************************************
' Properties modifying filter operation
'**********************************************************

' User selected Copy to range option button
Public Property Get CopyTo() As Boolean
    CopyTo = optCopy.Value
End Property

' User selected Unique values only checkbox
Public Property Get UniqueOnly() As Boolean
    UniqueOnly = chkUnique.Value
End Property

'**********************************************************
' Form controls
'**********************************************************
Private Sub cmdCancel_Click()
    mbOK = False
    Me.Hide
End Sub

Private Sub cmdOk_Click()
    mbOK = True
    Me.Hide
End Sub

Private Sub optCopy_Click()

    ' enable refExtract
    lblExtract.ForeColor = vbButtonText
    With refExtract
        .Enabled = True
        .BackColor = vbWindowBackground
    End With

End Sub

Private Sub optInPlace_Click()

    ' grey out and disable refExtract
    lblExtract.ForeColor = vbGrayText
    With refExtract
        .Enabled = False
        .BackColor = vbInactiveCaptionText
    End With

End Sub

Private Sub UserForm_Initialize()

    ' Set default values for List range and optInPlace
    refList.Value = Application.ActiveCell.CurrentRegion.Address
    optInPlace.Value = True

End Sub

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer)

    If CloseMode = vbFormControlMenu Then
        cmdCancel_Click
        Cancel = True
    End If

End Sub

And here’s the sub GetRanges, which ‘reads’ the form:

Private Sub GetRanges(ByRef rngSource As Excel.Range, _
                ByRef rngCriteria As Excel.Range, _
                ByRef rngExtract As Excel.Range, _
                ByRef bUnique As Boolean)

    Dim frmFilter As FRXFilter

    Set frmFilter = New FRXFilter
    frmFilter.Show

    If frmFilter.OK Then
        With frmFilter
            If .List <> "" Then _
                Set rngSource = Range(.List)
            If .Criteria <> "" Then _
                Set rngCriteria = Range(.Criteria)
            If .CopyTo Then _
                Set rngExtract = Range(.Extract)
            bUnique = .UniqueOnly
        End With
    End If

End Sub

I also want to include some error-handling to cope with bad range addresses, but apart from that it’s pretty much there. Next time I’ll post the sub at the next level up, calling GetRanges and a couple of other routines to perform the filtering.

SQLite in Ubuntu 9.04

There are a lot of good choices for a free/open-source DBMS in Ubuntu (or anywhere for that matter): MySQL and PostgreSQL would be the two that stand out most prominently. But when you’re developing a small-scale application which requires a small-scale database (whether or not you intend to scale up later), a client-server DBMS like that is the proverbial sledgehammer, with your data structure as the walnut. SQLite offers a file-based transactional SQL database engine, without the installation/configuration/administration time associated with a server-based DB. Think MS Access, cross-platform, minus all the forms.

I’m currently building a proof-of-concept application (for migrating a spreadmart system), using the Django framework to put a web form presentation layer across a db backend. SQLite is ideal for this sort of task, in that the development time is shortened – I can pretty much get down to writing code as soon as the design is complete. An interface to SQLite (sqlite3) is built in to Python since 2.5, so if you wish to create an SQLite database using Python, all you have to do is write the appropriate code to do so. Without using Python you’ll need to enable command-line access by installing the sqlite3 library:

sudo apt-get install sqlite3

or just open Synaptic and search for sqlite3.

So for instance to create a database file ‘expenses.db’, create a category table and populate one row in it, I can do the following from the command line:

$ sqlite3 expenses.db
SQLite version 3.6.10
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> create table category(
   ...> id integer primary key,
   ...> cat_code varchar(3),
   ...> cat_name varchar(20)
   ...> );
sqlite> insert into category values(
   ...> null, 'GRO', 'groceries'
   ...> );
sqlite> select * from category;
1|GRO|groceries

To do the same thing in Python:

>>> import sqlite3
>>> conn = sqlite3.connect('expenses.db')
>>> conn.execute('''create table category(
... id integer primary key,
... cat_code varchar(3),
... cat_name varchar(20)
... )''')
<sqlite3.Cursor object at 0xb77ce2c0>
>>> conn.commit()
>>> conn.execute('''insert into category values(
... null, 'GRO', 'groceries')''')
<sqlite3.Cursor object at 0xb77ce290>
>>> conn.commit()
>>> for row in conn.execute('select * from category'):
...   print row
...
(1, u'GRO', u'groceries')

If you prefer to have a GUI, you can install the SQLite Database Browser:

sudo apt-get install sqlitebrowser

This is a very handy (albeit rather basic) visual tool which gives you access to pretty much all the same functionality as you have from the command line:
sqlite-expenses.db

So that’s SQLite – simple, easy to use and very handy. Incidentally there’s a great blog post here extolling the virtues of SQLite in itself and also in combination with Excel. Intriguing reading and makes a great case for the benefits of using a “fractional horsepower” database. I’ll post more shortly on what this approach has allowed me to do with both Django and Excel.

Older Posts »