I’ve been making a little more use of PowerShell recently. It’s a very handy addition to the Windows environment, certainly it’s a lot easier to use than the traditional command line, and it’s a lot easier to find your way around. I figured I’d share a couple of the scripts I’ve used for some specific tasks to do with file management, in case someone found this handy.
Recently we’ve been upgrading our SAS environment, so we’ve had to move a great deal of our data onto new servers. One of the concerns we had with migrating data was that some of our users had been storing disconnected SAS datasets in one of the directories to be migrated. Not really a great practice, and we didn’t want to waste bandwidth transporting over stale data unless there was a real need to. So, we needed a list of datasets – just the really big ones – that had been saved, so that the users responsible could justify the need for them in the new environment, or (preferably), bid them a fond farewell:
# the directory we are migrating $badpath = "\\no\big\datasets\here\please" # output to this file $outfile = "\\list\of\big\datasets.csv" <# Extract a list of SAS datasets over 100MB in size, and write out the name, last modified date, size and location to the output file. #> Get-ChildItem $badpath -Recurse -Filter *.sas7bdat | ? {$_.Length -gt 100MB} | Select-Object Name, LastWriteTime, @{Name="SizeInMB";Expression={$_.Length/1MB}}, @{Name="Path";Expression={$_.directory}} | Export-Csv -Path $outfile
The Get-ChildItem cmdlet looks at all items in the folder $badpath, and the Recurse parameter tells it to look in all subfolders also. The Filter parameter allows us to limit the results of this search to SAS datasets. That gets piped to the Where-Object cmdlet (here shortened to ?), which allows only those passing the size limit in the script block through to the next pipe. Finally Select-Object takes a selection of properties of the objects coming through the where-object cmdlet and the Export-Csv cmdlet sends them to an output file. Note that as we know we are dealing with dataset files, the objects coming through will have type FileInfo. If we had not specified SAS datasets in the filter we would have had to include another cmdlet to exclude non-file items like directories and archives.
Another task involved searching through our SAS source code for something we knew was going to cause us problems in the new environment. Without going into too much detail as to the why of this, we needed to find any location where the developer had tested for whether a variable was equal to -1. As it turned out, due to the way SAS Data Integration Studio works, -1 gets added into generated code in a lot of places, so we needed a regular expression pattern to limit the cases returned. We didn’t care about the instances where special ETL macro variables were set to -1, or were compared with -1. So rather than looking for just the literal “-1” string in code, we needed a negative lookbehind regex to exclude these cases:
# Source code directory $jobdir = "\\SAS\ETL\Code\Goes\Here" # Output file $outfile = "\\List\Of\Problem\Lines.csv" # Occurrence of -1 that we care about $pattern = "(?<!--((etls_\w+\s+=\s)|(etls_recnt[,=])))-1" <# Extract lines from SAS ETL source code matching the pattern specified and write out the line to the output file #> Select-String -Pattern "$pattern" -Path $jobdir\*.sas | Select-Object Filename, LineNumber, Line | Export-Csv $outfile
Select-String is pretty much the PowerShell equivalent of grep. It returns a list of MatchInfo objects which, as with the FileInfo objects in the other example, all get passed to Select-Object to pull out the properties we want to know about to be exported to the output file.
PowerShell documentation is available at https://technet.microsoft.com/en-us/library/bb978526.aspx but you’ll find the built-in F1 help in PowerShell is just as easy to find your way around. Happy scripting!