Summing a random sample

A colleague came to me recently with a problem:
I have a bunch of numbers in a spreadsheet, and I need to take a random sample of 50 of them, and sum it. I can do the sample by random sorting, but I need to repeat this several times, so I’d like a function that I can repeat without having to muck around.

Well, sampling – I have some code which will do just that. After that, I think I should be able to manage summing the array.

Here’s the code for creating a random sample:

Private Sub SampleArray(ByRef avBigset As Variant, ByRef avSmallset As Variant)
    ' SampleArray populates a random sample avSmallset from an array avBigset
    ' without replacement (each element in avBigset is considered once only)
    Dim lRemainder As Long
    Dim lSize As Long
    Dim lOb As Long
    Dim lPickit As Long
        
    ' Make sure we're dealing with arrays...
    If Not IsArray(avBigset) Or Not IsArray(avSmallset) Then Exit Sub
    ' Initialise
    lRemainder = UBound(avBigset)
    lSize = UBound(avSmallset)
    Randomize 0
    
    Do While lSize > 0  ' Still some left to pick up
        lOb = lOb + 1
        If Rnd < lSize / lRemainder Then
            lPickit = lPickit + 1
            avSmallset(lPickit) = avBigset(lOb)
            lSize = lSize - 1
        End If
        lRemainder = lRemainder - 1
    Loop    ' Sample complete
    
End Sub

The issue is that it’s built to work with 1d arrays, so given a range as input in the function, I would need to push the values of the range into an array. I could do that with something simple like the below:

Private Function arr(ByRef rng As Range) As Variant
    ' Convert range to 1-D array
    Dim i As Long
    Dim j As Long
    Dim k As Long
    Dim rowCount As Long
    Dim colcount As Long
    Dim lim As Long
    Dim sArr() As String
    Dim vArr As Variant
    
    vArr = rng.Value2
    
    ' dimension array
    rowCount = UBound(vArr)
    colcount = UBound(vArr, 2)
    lim = rowCount * colcount
    ReDim sArr(1 To lim)
    k = 1
    
    ' populate
    For i = 1 To rowCount
        For j = 1 To colcount
            sArr(k) = vArr(i, j)
            k = k + 1
        Next j
    Next i
    
    arr = sArr
    
End Function

This will do the trick, but performance-wise it works out pretty horribly. The bigger the array, the longer this step will take, and it’s all useless, unnecessary overhead. The better approach is I think to revise the SampleArray function so it will work with a range rather than an array.

Actually this turns out to be very easy. The SampleArray code loops over an array, but with a range, we can just loop over the Cells collection. The rest of the code is pretty much identical, except that we’ll use a For Each…Next loop rather than Do While:

Private Sub SampleRange(ByRef rngBigset As Variant, ByRef avSample As Variant)
    ' SampleRange populates a random sample avSample from a range rngBigset
    ' without replacement (each element in rngBigset is considered once only)
    Dim lRemainder As Long
    Dim lSize As Long
    Dim lPickit As Long
    Dim rngCell As Excel.Range
    
    
    ' Initialise
    lRemainder = rngBigset.Cells.Count
    lSize = UBound(avSample)
    Randomize 0
    
    For Each rngCell In rngBigset.Cells
        If lSize <= 0 Then Exit For ' Sample complete
        If Rnd < lSize / lRemainder Then
            lPickit = lPickit + 1
            avSample(lPickit) = rngCell
            lSize = lSize - 1
        End If
        lRemainder = lRemainder - 1
    Next rngCell

End Sub

So is there a significant performance gain in doing it this way, taking out the array function? Unsurprisingly, yes there is. I created two functions, SAMPLESUM, which requires the conversion to an array beforehand:

Public Function SAMPLESUM(ByRef in_range As Range, ByVal sample_size As Long) As Variant
    ' Sum a random sample of size sample_size from in_range
    Dim sample_array() As Double
    Dim in_array As Variant
    Dim index As Long
    
    ' set up large and small arrays
    in_array = arr(in_range)
    ReDim sample_array(1 To WorksheetFunction.Min(UBound(in_array), sample_size))
    
    ' get sample
    SampleArray in_array, sample_array
    ' sum sample array
    SAMPLESUM = WorksheetFunction.Sum(sample_array)
    
End Function

and SAMPLESUM2, working directly with the range:

Public Function SAMPLESUM2(ByRef in_range As Range, ByVal sample_size As Long) As Variant
    ' Sum a random sample of size sample_size from in_range
    Dim sample_array() As Double
    Dim index As Long
    
    ' set up large and small arrays
    ReDim sample_array(1 To WorksheetFunction.Min(in_range.Cells.Count, sample_size))
    
    ' get sample
    SampleRange in_range, sample_array
    ' sum sample array
    SAMPLESUM2 = WorksheetFunction.Sum(sample_array)

End Function

I then set up a function which timed 100 calculations of each version, taking a sample of size 10 from a range of numbers with 50,000 rows and 10 columns:

Public Sub timeit()
Dim starttime As Double
Dim endtime As Double
Dim totaltime As Double
Dim i As Integer
Dim numrange As Excel.Range

Set numrange = Range("A1:J50000")


starttime = Timer

For i = 1 To 100
    SAMPLESUM numrange, 10
Next i

endtime = Timer
totaltime = endtime - starttime

Debug.Print "Time taken for 100 calculations (SAMPLESUM): " & totaltime & " seconds"

starttime = Timer

For i = 1 To 100
    SAMPLESUM2 numrange, 10
Next i

endtime = Timer
totaltime = endtime - starttime

Debug.Print "Time taken for 100 calculations (SAMPLESUM2): " & totaltime & " seconds"

End Sub

Which output:

Time taken for 100 calculations (SAMPLESUM): 44.140625 seconds
Time taken for 100 calculations (SAMPLESUM2): 12.546875 seconds

I think I’ll go with door number 2.

An Undo Stack in VBA

I’m working on an Excel add-in which will allow a user to perform operations on cells. I’d also like to allow the user to ‘undo’ these operations. Excel itself, of course, does have an Undo button (shortcut Ctrl + Z), but this works only to undo changes made in the user interface, and is not going to help for changes made by VBA. This means the add-in will need to remember each operation performed and have them ready to supply for un-doing should the user want to do so.

Many people have created their own general solutions for undoing changes made in VBA – for instance, here’s one from Jan Karel Pieterse (who also created the hugely useful Name Manager add-in): Undo With Excel VBA. However I decided not to use a solution like this – it’s quite a lot of code to add to a project, and it is very general.

In this case I’m happy to write a more specific solution, which copes with undoing a specific action, by simply delegating the work of undoing the action to the object which did the action in the first place. Here’s an example:

' CellTest Class - allows calling code to change the value of a
' cell and then change it back

'*****************************************************************
' Private Variables
'*****************************************************************
Private mCell As Range
Private vOrigFormula As Variant

'*****************************************************************
' Public Properties
'*****************************************************************
Public Property Get Address() As String
  Address = mCell.Address
End Property

Public Property Set Cell(ByRef rCell As Range)
  Set mCell = rCell
  vOrigFormula = rCell.Formula
End Property

'*****************************************************************
' Public Methods
'*****************************************************************
Public Sub Change(ByVal sText As String)
  mCell.Value = sText
End Sub

Public Sub ChangeBack()
  mCell.Formula = vOrigFormula
End Sub

So what about storing up a list of operations for undoing? This is ideally suited to a stack. Stacks are neat data structures – they don’t do much, but they do it well and they’re really easy to code. Here’s one I picked up from the VBA Developer’s Handbook and tweaked a little to suit. There are two classes, StackItem and Stack. StackItem simply stores a value and a reference to another StackItem object – the next item on the stack.

' StackItem class holds a reference to the object it refers to and
' a reference to the next item in the stack

Public Value As Variant
Public NextItem As StackItem

Stack stores a reference to the item on the top of the stack, and exposes the functions Push (to push a new item on to the top) and Pop (to remove the top item and return a reference to it). Properties IsEmpty and Peek tell you if there are any items in the stack, and what the value of the top item is (without removing it), respectively. There’s a very nice visual explanation of how this looks here.

' Stack holds a reference to the top item in the stack data structure
' and manages the adding and removing of stack items

Private siTop As StackItem

'*****************************************************************
' Public Properties
'*****************************************************************
Public Property Get IsEmpty() As Boolean
    IsEmpty = siTop Is Nothing
End Property

Public Property Get Peek() As Variant
    If IsObject(siTop.Value) Then
        Set Peek = siTop.Value
    Else
        Peek = siTop.Value
    End If
End Property

'*****************************************************************
' Public Methods
'*****************************************************************
Public Function Push(ByRef varIn As Variant) As Boolean

    Dim siNew As StackItem

On Error GoTo PushError

    Set siNew = New StackItem
    If IsObject(varIn) Then
        Set siNew.Value = varIn
    Else
        siNew.Value = varIn
    End If
    Set siNew.NextItem = siTop
    Set siTop = siNew
    
    Push = True
    
Exit Function

PushError:
    Push = False
End Function

Public Function Pop() As Variant

    If IsObject(siTop.Value) Then
        Set Pop = siTop.Value
    Else
        Pop = siTop.Value
    End If
    Set siTop = siTop.NextItem

End Function

So now to store an actions for undoing, I just need to push the object which performed it onto the undo stack. Every time I wish to actually undo an action, I just pop the object reference off, and instruct it to undo.

Next time, I’ll demonstrate how this looks in practice.

Count Distinct Values

I’m sorry for the lack of posts over the last few months – what can I say, I’ve been busy. I had meant to continue the series on data access patterns from here and I put a revision of the code here, but I haven’t had the time to do much more. I may get back to it eventually, but in the meantime I thought I’d post something else which I’ve found dictionaries to be quite useful for.

The below is a function which gives a count of the distinct items in an array. As items are added, the dictionary will prevent duplication of key values – and the Count method gives the number of keys in the dictionary.

Public Function CountDistinct(ByRef vArr As Variant) As Variant

    Dim dict As Scripting.Dictionary
    Dim lVal As Long
    
    Set dict = New Scripting.Dictionary
    
    For lVal = LBound(vArr) To UBound(vArr)
        dict(vArr(lVal)) = vArr(lVal)
    Next lVal
    
    CountDistinct = dict.Count

End Function

?CountDistinct(array(1,2,3,3,4))
 4

A much simpler implementation is available in Python, using the set datatype:

>>> def CountDistinct(vals):
...   return len(set(vals))
... 
>>> CountDistinct([1, 2, 3, 3, 4])
4

Dictionary vs Collection

A pretty common requirement in programming is to be able to create a group of related objects, and put them into some sort of container. This means they can effectively be treated as one object, while still being accessible individually. For example, if I’m tracking the stats of a sports team over the course of a game, I could create a bunch of Player objects, all of which have individual attributes, but which I can view collectively as a Team object.

One way to group the Players would be to put them into a Team array. A weakness of this approach is that the only way to access one of the Players is to use their position in the array. This is ok for arrays where the positions of the members aren’t going to change, but if this isn’t the case it gets harder to keep track of things the longer the array persists in memory and the more changes need to be made to it. What is useful in this case is to be able to retrieve a Player object by using a key that uniquely identifies them, such as their name for instance, and not have to worry about which position in the Team array they are in.

The VB Collection object allows you to do this, by giving you the option of defining a ‘key’ value when the item is added to it – for instance, the below adds the range object identified by the name RangeName to a collection, and uses RangeName as the key:

Dim col As Collection
Dim rng As Excel.Range
Dim sRangeName As String
Set col = New Collection
sRangeName = "RangeName"
Set rng = Range(sRangeName)
col.Add rng, sRangeName

Now, when you wish to retrieve a particular item from the collection, you have the option of either specifying its position (index), or using the key value. If you know this was the first item added and its position hasn’t changed, you can use

col.Item(1)

(VB collections are 1-based). Otherwise, you can use

col.Item(sRangeName)

You can also iterate over the collection using For Each…Next syntax, which is both more concise and faster than accessing an array one item at a time.

Collections are fine – and pretty much all the container structures in the Excel object model use them, which is why you can say

For Each wb in Workbooks

or

For Each cell in rng.Cells

However, I hardly ever use them for implementing my own containers – where key-based lookup is required I prefer the VBScript Dictionary. The syntax for adding objects to a dictionary is very similar to adding items to a collection, apart from the reversal of the key and item arguments:

Dim dict As Scripting.Dictionary
Dim rng As Excel.Range
Dim sRangeName As String
Set dict = New Scripting.Dictionary
sRangeName = "RangeName"
Set rng = Range(sRangeName)
dict.Add sRangeName, rng

as is the retrieval of items:

dict.Item(sRangeName)

except for the fact that you have to use the key (which is safer in any case).

What really sells the dictionary to me over the collection are the following 3 methods:

dict.Exists(k) – boolean to let you know whether or not the key k has been added to the dictionary. This is very useful, as it avoids the need for the clunky idiom you’re forced to use to check the same thing with collections:

Public Function Exists(ByRef col As Collection, _
                    ByVal sKey As String) As Boolean

    Dim lCheck as Long
    On Error Resume Next
    lCheck = VarType(col.Item(sKey))
    If Err.Number = 0 Then
        Exists = True
    Else
        Exists = False
    End If

End Function

dict.Keys – Return an array containing all keys in the dictionary, which can then be iterated over. This for me is where the collection really falls down, as while it’s certainly possible to iterate over the items in a collection, the key cannot be returned from the item – which means it’s not available. A good example of why you might want this is the following from the CRXRowDict.Test function in the last post:

For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey

The key is required in two places – once to retrieve the matching regexp object from the dictionary and once to retrieve the test string from the CRowDict instance. The only way I could see to achieve the same result with a collection would be to populate and iterate over a separate array – not pretty.

dict.RemoveAll – As the name suggests, clean out all key-item pairs in the dictionary. No need to loop through either keys or items here, quick and easy.

More Regex Functions, part 2

Part 2 of 3

After building the form in Part 1, the next stage of coding the regex filter involves implementing the logic for deciding which rows of the list to be filtered will pass through the filter. Each row in the criteria range (rngCriteria) can be thought of as a test made up of a set of criteria. In order for a row in the list range (rngSource) to pass the test, all of these criteria must be met. If the list row passes the test for any row in rngCriteria, then it will be included in the eventual output.

Given a criteria range with m rows (excluding the header row) and n columns, this can be modelled in pseudo-SQL:

SELECT * FROM rngSource AS a
WHERE 
	(Crit[1,1](a) AND ... AND Crit[1,n](a))
	OR ...
	OR (Crit[m,1](a) AND ... AND Crit[m,n](a))

where Crit[i,j](a) is a boolean function representing the criterion in row i, column j.

For the built-in advanced filter, Excel constructs these criteria by parsing the text in the criteria range – a large range of tests are available, including the use of relational operators, wildcard and exact text matches. For an introduction to use of the advanced filter, see the tutorial at Contextures.

For our purposes, though, we’ll be allowing only regular expression pattern matches on the text in the list rows. While in theory it would be possible to mimic all the built-in advanced filter functionality and add regular expression tests into the mix, I’m not attempting to extend the advanced filter at this point – instead I’m creating an alternative to it.

In order to test the rows, I decided to implement a couple of classes to contain them – one for the rows in the list and extract ranges, and one which works slightly differently for the criteria range. In both classes, the core idea is to use the Scripting.Dictionary object to store key-value pairs, where the keys are provided by the header rows and the values by the text in each cell.

The first of these, CRowDict, is what I intend to use for the rows in the list and extract ranges:

Option Explicit

' CRowDict - wraps dictionary, holds key-value pairings
' for each row in the list

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) As Variant
    If mdicValues.Exists(sKey) Then
        Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    
    For colNum = 1 To rngHeaders.Cells.Count
        mdicValues.Add rngHeaders(1, colNum).Text, _
                        rngValues(1, colNum).Text
    Next colNum

End Sub

Public Sub Display()
    ' Convenience method for testing
    Dim vKey As Variant
    For Each vKey In mdicValues.Keys
        Debug.Print vKey, mdicValues.Item(vKey)
    Next vKey
End Sub

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

The idea here is to allow us to access the text values in each row by name rather than position – this allows us the flexibility to change the order and/or number of columns returned to the extract range.

The class CRXRowDict is very similar, but stores RegExp objects rather than text. It also includes a Test method, which operates on CRowDict instances and returns True if all of the required text values match the corresponding regex patterns:

Option Explicit

' CRXRowDict - wraps dictionary, holds key-value pairings
' for each row in the list, where values are RegExp objects

'********************************************************************
' Class private attributes
'********************************************************************

' holds dictionary object to be populated
Private mdicValues As Scripting.Dictionary
Private Const msCLASS As String = "CRXRowDict"

'********************************************************************
' Class public properties
'********************************************************************
Public Property Get Item(ByVal sKey As String) _
                            As VBScript_RegExp_55.RegExp
    If mdicValues.Exists(sKey) Then
        Set Item = mdicValues.Item(sKey)
    Else
        Err.Raise glNOTFOUND, msCLASS, gsNOTFOUND & sKey
    End If
End Property

'********************************************************************
' Class public methods
'********************************************************************

Public Sub Populate(ByRef rngHeaders As Excel.Range, _
                ByRef rngValues As Excel.Range)
    ' Populates dictionary from the row rngValues
    
    Dim colNum As Long
    Dim objRegex As VBScript_RegExp_55.RegExp
    
    ' create new regexp object for each cell in the row
    For colNum = 1 To rngHeaders.Cells.Count
        Set objRegex = New VBScript_RegExp_55.RegExp
        objRegex.Pattern = rngValues(1, colNum).Text
        mdicValues.Add rngHeaders(1, colNum).Text, objRegex
    Next colNum

End Sub

Public Function Test(ByRef clsRow As CRowDict) As Boolean
    ' Executes the test method for each regexp object
    ' against the corresponding item in clsRow
    
    Dim objRegex As VBScript_RegExp_55.RegExp
    Dim iKey As Integer
    Dim sKey As String
    Dim sTest As String
    Dim bResult As Boolean
    
    ' Assume true to begin with
    bResult = True
    
    For iKey = LBound(mdicValues.Keys) To UBound(mdicValues.Keys)
        sKey = mdicValues.Keys(iKey)
        Set objRegex = mdicValues(sKey)
        sTest = clsRow.Item(sKey)
        Debug.Print objRegex.Pattern, sTest
        ' if any test fails, return false
        If Not objRegex.Test(sTest) Then
            bResult = False
            Exit For
        End If
    Next iKey
    
    Test = bResult
    
End Function

'********************************************************************
' Class private methods
'********************************************************************

Private Sub Class_Initialize()
    Set mdicValues = New Scripting.Dictionary
End Sub

In the final part of this series (which I hope to put out by next week) I’ll be putting this all together, showing how these classes are used to implement the filter function.

More Regex Functions

And finally, I’m back! I finally overcame the inertia yesterday and got back to work on the regular expressions add-in. The next goal is to include regular expressions equivalents for Excel’s built-in advanced filter and find/replace functions. So far I’ve concentrated on just the filter part, which I hope to have finished fairly soon. However as there’s a fair amount of code involved, I’ll split this into 2 or 3 posts.

Firstly, the advanced filter dialog. Not being a big fan of re-inventing the wheel I decided to build pretty much exactly the same form as you find Excel presenting you with for an ordinary advanced filter. Here’s what my version looks like:

 

All that this form needs to do is collect 3 key pieces of information to control the filter process:

  1. What ranges will the filter operate on?
  2. Will the source range (List Range) be filtered in place, or will the filtered results be copied to another place (Copy to)?
  3. Does the filtered list need to display unique records only?

Here’s the code behind the form. The public properties List, Criteria and Extract return the addresses for the ranges in (1) from the RefEdit controls, and the booleans CopyTo and UniqueOnly return the values for (2) and (3) respectively.

Option Explicit

Private mbOK As Boolean

'**********************************************************
' Form class public properties
'**********************************************************

' User clicked OK
Public Property Get OK() As Boolean
    OK = mbOK
End Property
'**********************************************************
' Ranges selected for filter
'**********************************************************
Public Property Get List() As String
' List range to be filtered
    List = refList.Value
End Property

Public Property Get Criteria() As String
' Range containing criteria for filter
    Criteria = refCriteria.Value
End Property

Public Property Get Extract() As String
' Range to copy filtered rows to
    Extract = refExtract.Value
End Property

'**********************************************************
' Properties modifying filter operation
'**********************************************************

' User selected Copy to range option button
Public Property Get CopyTo() As Boolean
    CopyTo = optCopy.Value
End Property

' User selected Unique values only checkbox
Public Property Get UniqueOnly() As Boolean
    UniqueOnly = chkUnique.Value
End Property

'**********************************************************
' Form controls
'**********************************************************
Private Sub cmdCancel_Click()
    mbOK = False
    Me.Hide
End Sub

Private Sub cmdOk_Click()
    mbOK = True
    Me.Hide
End Sub

Private Sub optCopy_Click()
    
    ' enable refExtract
    lblExtract.ForeColor = vbButtonText
    With refExtract
        .Enabled = True
        .BackColor = vbWindowBackground
    End With
    
End Sub

Private Sub optInPlace_Click()

    ' grey out and disable refExtract
    lblExtract.ForeColor = vbGrayText
    With refExtract
        .Enabled = False
        .BackColor = vbInactiveCaptionText
    End With
    
End Sub

Private Sub UserForm_Initialize()

    ' Set default values for List range and optInPlace
    refList.Value = Application.ActiveCell.CurrentRegion.Address
    optInPlace.Value = True

End Sub

Private Sub UserForm_QueryClose(Cancel As Integer, CloseMode As Integer)

    If CloseMode = vbFormControlMenu Then
        cmdCancel_Click
        Cancel = True
    End If

End Sub

And here’s the sub GetRanges, which ‘reads’ the form:

Private Sub GetRanges(ByRef rngSource As Excel.Range, _
                ByRef rngCriteria As Excel.Range, _
                ByRef rngExtract As Excel.Range, _
                ByRef bUnique As Boolean)
                
    Dim frmFilter As FRXFilter

    Set frmFilter = New FRXFilter
    frmFilter.Show
    
    If frmFilter.OK Then
        With frmFilter
            If .List <> "" Then _
                Set rngSource = Range(.List)
            If .Criteria <> "" Then _
                Set rngCriteria = Range(.Criteria)
            If .CopyTo Then _
                Set rngExtract = Range(.Extract)
            bUnique = .UniqueOnly
        End With
    End If
                
End Sub

I also want to include some error-handling to cope with bad range addresses, but apart from that it’s pretty much there. Next time I’ll post the sub at the next level up, calling GetRanges and a couple of other routines to perform the filtering.

Accessing Dynamic Named Ranges

I was planning to continue the series with a post on accessing Excel data from Python – this is a brief diversion related to a problem I’ve encountered recently with using SAS to read Excel files.

I have a bunch of Excel files which I need to read every month. The relevant data is contained in a dynamic named range – that is, a name where the RefersTo string is typically a formula of the form

=OFFSET(fixedCell,0,0,COUNTA(columnInRange),rangeWidth)

This is useful when the amount of data varies from file to file – in this case, some files will have 5 rows, and some will have 5,000. The use of the COUNTA() function means that the number of rows in the range is evaluated by Excel, hence the height of the range adjusts automatically. Which is great, when you’re reading those ranges in Excel, but which can cause problems when you’re trying to get at them without the benefit of the Excel calculation engine. Case in point – the read_file macro from a previous post which uses the SAS Excel libname engine, fails to read dynamic ranges. I’m not 100% sure that it’s impossible to do so (the wonderful folks at SAS tech support are looking into it for me), but it has always failed so far.

So, I need a workaround. I came up with the following – it’s a bit of a clunker, but it works. The basic idea: run an Excel macro over the files which copies the data from the dynamic range into a static range in a new workbook.

Private Sub ReadFile(ByVal sBookName As String)
' Reads sitting time files in INPATH, copies all time entries
' to a new workbook in OUTPATH
Dim wbkFile As Excel.Workbook
Dim wbkOut As Excel.Workbook
Dim rngData As Excel.Range
Dim rngOut As Excel.Range
Dim sBranchName As String
Dim sMonth As String


' Specify required values for output range
Set wbkFile = Application.Workbooks.Open(INPATH & sBookName)
Set rngData = wbkFile.Names("rngTimeData").RefersToRange
sBranchName = wbkFile.Names("BranchName").RefersToRange.Value
sMonth = Format$(Evaluate(wbkFile.Names("Month").RefersTo), _
        "mmyy")

' Add new workbook and specifies range to dump data into
Set wbkOut = Application.Workbooks.Add()
Set rngOut = wbkOut.Worksheets(1).Cells(1, 1)
Set rngOut = rngOut.Resize(rngData.Rows.Count, _
        rngData.Columns.Count)
' transfer values across, name the range, save and close the new
' workbook
With rngOut
    .Value = rngData.Value
End With
wbkOut.Names.Add "TimeData", rngOut


wbkFile.Close False
sBranchName = Replace$(sBranchName, " ", "_")
sBranchName = Left$(sBranchName, InStr(sBranchName, "_DC") - 1)
wbkOut.SaveAs OUTPATH & sBranchName & "_" & sMonth, _
        xlWorkbookNormal
wbkOut.Close


End Sub

Public Sub GetFiles()
' Read all files in INPATH, save copies into OUTPATH and check 
' which ones are missing
Dim fsObj As Scripting.FileSystemObject
Dim inFolder As Scripting.Folder
Dim inFile As Scripting.File


Application.ScreenUpdating = False


Set fsObj = CreateObject("Scripting.FileSystemObject")
Set inFolder = fsObj.GetFolder(INPATH)


For Each inFile In inFolder.Files
    ReadFile inFile.Name
Next inFile


OutputDirectoryContents


Application.ScreenUpdating = True


End Sub

OutputDirectoryContents is a macro which writes the names of all the files in the directory OUTPATH to the main worksheet, then runs an advanced filter over the table of files that should be in there, returning those that aren’t (the Criteria range is set to File Found =FALSE):

Private Sub OutputDirectoryContents()

' Lists contents of OUTPATH on the front sheet and returns any
' missing filenames
Dim fsObj As Scripting.FileSystemObject
Dim outFolder As Scripting.Folder
Dim outFile As Scripting.File
Dim wksht As Excel.Worksheet
Dim rngFiles As Excel.Range
Dim startCell As Excel.Range
Dim rngSource As Excel.Range
Dim rngCriteria As Excel.Range
Dim rngExtract As Excel.Range


' set required variables...
Set fsObj = CreateObject("Scripting.FileSystemObject")
Set outFolder = fsObj.GetFolder(OUTPATH)
Set wksht = ThisWorkbook.Worksheets(1)
Set rngFiles = ThisWorkbook.Names("Files_Found").RefersToRange
Set startCell = rngFiles(1)
Set rngSource = ThisWorkbook.Names("FileWishlist").RefersToRange
Set rngCriteria = ThisWorkbook.Names("rngCriteria").RefersToRange
Set rngExtract = ThisWorkbook.Names("rngExtract").RefersToRange


' Clear, then repopulate, range of files found in the specified folder
rngFiles.ClearContents


startCell.Value = Format$(DateAdd("m", -1, Date), "mmm-yyyy")
For Each outFile In outFolder.Files
    Set startCell = startCell.Offset(1)
    startCell.Value = outFile.Name
Next outFile


' Refresh filter to see which ones are missing...
RefreshAdvancedFilter rngCriteria, rngExtract, rngSource, , True


End Sub

Public Sub RefreshAdvancedFilter(ByRef rngCriteria As Excel.Range, _
               ByRef rngExtract As Excel.Range, _
               ByRef rngSource As Excel.Range, _
               Optional ByVal IsCopy As Boolean = True, _
               Optional ByVal IsUnique As Boolean = False)
' Refreshes advanced filter of the source range
Dim lCopy As Long


lCopy = IIf(IsCopy, xlFilterCopy, xlFilterInPlace)


rngSource.AdvancedFilter Action:=lCopy, criteriarange:=rngCriteria, _
    copytorange:=rngExtract, unique:=IsUnique


End Sub

Once I’m sure that all the files are there that need to be, I can run the SAS macro get_all_files over the contents of OUTPATH to create the datasets I want. Slightly convoluted (and a little wasteful in that the data is duplicated), but it gets the job done, so I’m happy.