A couple of PowerShell scripts

I’ve been making a little more use of PowerShell recently. It’s a very handy addition to the Windows environment, certainly it’s a lot easier to use than the traditional command line, and it’s a lot easier to find your way around. I figured I’d share a couple of the scripts I’ve used for some specific tasks to do with file management, in case someone found this handy.

Recently we’ve been upgrading our SAS environment, so we’ve had to move a great deal of our data onto new servers. One of the concerns we had with migrating data was that some of our users had been storing disconnected SAS datasets in one of the directories to be migrated. Not really a great practice, and we didn’t want to waste bandwidth transporting over stale data unless there was a real need to. So, we needed a list of datasets – just the really big ones – that had been saved, so that the users responsible could justify the need for them in the new environment, or (preferably), bid them a fond farewell:

# the directory we are migrating
$badpath = "\\no\big\datasets\here\please"
# output to this file
$outfile = "\\list\of\big\datasets.csv"

  Extract a list of SAS datasets over 100MB in size, and
  write out the name, last modified date, size and location
  to the output file.
Get-ChildItem $badpath -Recurse -Filter *.sas7bdat |
 ? {$_.Length -gt 100MB} |
 Select-Object Name, LastWriteTime, 
 @{Name="Path";Expression={$_.directory}} |
 Export-Csv -Path $outfile

The Get-ChildItem cmdlet looks at all items in the folder $badpath, and the Recurse parameter tells it to look in all subfolders also. The Filter parameter allows us to limit the results of this search to SAS datasets. That gets piped to the Where-Object cmdlet (here shortened to ?), which allows only those passing the size limit in the script block through to the next pipe. Finally Select-Object takes a selection of properties of the objects coming through the where-object cmdlet and the Export-Csv cmdlet sends them to an output file. Note that as we know we are dealing with dataset files, the objects coming through will have type FileInfo. If we had not specified SAS datasets in the filter we would have had to include another cmdlet to exclude non-file items like directories and archives.

Another task involved searching through our SAS source code for something we knew was going to cause us problems in the new environment. Without going into too much detail as to the why of this, we needed to find any location where the developer had tested for whether a variable was equal to -1. As it turned out, due to the way SAS Data Integration Studio works, -1 gets added into generated code in a lot of places, so we needed a regular expression pattern to limit the cases returned. We didn’t care about the instances where special ETL macro variables were set to -1, or were compared with -1. So rather than looking for just the literal “-1” string in code, we needed a negative lookbehind regex to exclude these cases:

# Source code directory
$jobdir = "\\SAS\ETL\Code\Goes\Here"
# Output file
$outfile = "\\List\Of\Problem\Lines.csv"
# Occurrence of -1 that we care about
$pattern = "(?<!--((etls_\w+\s+=\s)|(etls_recnt[,=])))-1"

  Extract lines from SAS ETL source code matching the pattern
  specified and write out the line to the output file
Select-String -Pattern "$pattern" -Path $jobdir\*.sas |
 Select-Object Filename, LineNumber, Line |
  Export-Csv $outfile

Select-String is pretty much the PowerShell equivalent of grep. It returns a list of MatchInfo objects which, as with the FileInfo objects in the other example, all get passed to Select-Object to pull out the properties we want to know about to be exported to the output file.

PowerShell documentation is available at https://technet.microsoft.com/en-us/library/bb978526.aspx but you’ll find the built-in F1 help in PowerShell is just as easy to find your way around. Happy scripting!

Why I’d like Python at work, part 2

Automating Excel with Python

As well as using Python to interact with the filesystem, it’s possible to use COM automation to open Excel and access its functionality. I decided to take a look at Mark Hammond’s PythonWin IDE, which contains the win32com.client package to provide a wrapper for the Excel automation object. For example, the function newest from part 1 could be implemented as follows:

# OpenNewest.py - opens latest Excel file
import os, os.path, win32com.client

def newest(ext, path=os.getcwd()):
    '''pre: path a valid absolute filepath (optional), ext a file
    extension (e.g. '.xls')
    returns: name of the newest (last modified on *nix systems,
    last created on Windows) file with extension ext from path
    file_dict = dict([(fname, \
      os.path.getctime(os.path.join(path, fname))) \
      for fname in os.listdir(path) \
      if os.path.splitext(fname)[1] == ext])
    return os.path.join(path, \
                        max(file_dict, key=lambda x: file_dict[x]))

if __name__ == __main__:
  xlApp = win32com.client.Dispatch("Excel.Application")
  lastFileName = newest('.xls')
  lastFile = xlApp.Workbooks.Open(lastFileName)

  # Do some stuff with lastFile...

  xlApp = None

So, what stuff can you do? One thing that would be nice is to be able to open an Excel file and run a macro in it. I decided to test this out by saving a file named ‘PythonTest.xls’, containing the macro ‘helloFromPython’:

Public Sub helloFromPython()
Sheet1.Range("B1:B100").Value = "Hello from Python!"
End Sub

The script RunInExcel.py then creates an Excel automation object, which opens the file and calls the macro in it:

# RunInExcel.py - opens named Excel file fName, runs named macro
# macName
import os, os.path, win32com.client

def run_macro(fName, macName, path=os.getcwd()):
  pre: fName is the name of a valid Excel file with macro macName
  post: fName!macName is run, fName saved and closed
  fName = os.path.join(path, fName)
  xlApp = win32com.client.Dispatch("Excel.Application")
  fTest = xlApp.Workbooks.Open(fName)
  macName = fTest.Name + '!' + macName
  xlApp = None

if __name__ == "__main__":
    run_macro("PythonTest.xls", "helloFromPython")

Run the script, open the file and voila!


What I’m thinking would be really nice is to be able to automate a lot of our regular reporting by doing this – a single script could be run to kick off a batch of reports. I wonder, though, about how efficient using Python to do this would be as opposed to using a VBA/VBScript routine to do the same thing…does anyone else use Python for this sort of task?