Occasionally we get asked to provide random samples from our database, and it’s the sort of thing that can be done very easily in SAS.
For example, here’s a SAS routine I have often used to extract a sample sampleset of size 100 without replacement from a large dataset bigset (I can’t remember where this code originally came from but I definitely can’t take credit for it):
data sampleset (drop=remainder size);
do while (size > 0);
pickit + 1;
if ranuni(0) < size/remainder then do; set bigset point=pickit nobs=totobs; output; size=size - 1; end; remainder=remainder - 1; end; stop; run;[/sourcecode] While this is usually sufficient, I have sometimes wondered whether it would be possible to use the same logic to extract a sample in VBA. So tonight I decided to test it out. The following code appears to do the trick quite nicely, but be warned, I haven't done a lot of testing yet. [sourcecode language='vb']Public Sub SampleArray(ByRef avBigset As Variant, ByRef avSmallset As Variant) ' SampleArray populates a random sample avSmallset from an array avBigset ' without replacement (each element in avBigset is considered once only) Dim lRemainder As Long Dim lSize As Long Dim lOb As Long Dim lPickit As Long ' Make sure we're dealing with arrays... If Not IsArray(avBigset) Or Not IsArray(avSmallset) Then Exit Sub ' Initialise lRemainder = UBound(avBigset) lSize = UBound(avSmallset) Randomize 0 Do While lSize > 0 ‘ Still some left to pick up
lOb = lOb + 1
If Rnd < lSize / lRemainder Then lPickit = lPickit + 1 avSmallset(lPickit) = avBigset(lOb) lSize = lSize - 1 End If lRemainder = lRemainder - 1 Loop ' Sample complete End Sub[/sourcecode] It coped fairly well with populating a sample array of size 1,000 from an array of size 10,000,000, but as I say I haven't tested too extensively so I don't yet know its limits, or whether there is a much more efficient method out there. Has anyone else used VBA in this way? What sort of method would you use?