Filtered Azure Blob to Blob Copy

I recently had the job of copying ten’s of thousands of IIS log files, each one at least 100MB, from one Azure Storage account to another.  Using something simple like CloudBerry to copy the file just wasn’t going to cut it as it copies the file first to the local client, then pushes it back into Azure, not efficient at all.

A quick bit of digging and I discovered that the Azure PowerShell cmdlet Start-AzureStorageBlobCopy allows you to trigger a copy Azure to Azure, which runs very quickly, it will even allow you to copy an entire container from one storage account to another; what it won’t allow you to do is pass a filter in so only copies files matching the filter.

So here’s a function that I wrote to get that functionality, with some progress bars and timers for added effect 🙂

Function Start-AzureStorageBlobContainerCopy ( [Parameter(Mandatory=$true)][String]$srcStorageAccountName, [Parameter(Mandatory=$true)][String]$destStorageAccountName, [Parameter(Mandatory=$true)][String]$SrcStorageAccountKey, [Parameter(Mandatory=$true)][String]$DestStorageAccountKey, [Parameter(Mandatory=$true)][String]$SrcContainer, [Parameter(Mandatory=$true)][String]$DestContainer, [String]$filter = "" ) { Import-Module Azure $srcContext = New-AzureStorageContext -StorageAccountName $srcStorageAccountName -StorageAccountKey $SrcStorageAccountKey $destContext = New-AzureStorageContext -StorageAccountName $destStorageAccountName -StorageAccountKey $DestStorageAccountKey $timeTaken = measure-command{ if ($filter -ne "") { $blobs = Get-AzureStorageBlob -Container $SrcContainer -Context $srcContext | ? {$_.name -match $filter} } else { $blobs = Get-AzureStorageBlob -Container $SrcContainer -Context $srcContext } } Write-host "Total Time to index $timeTaken" -BackgroundColor Black -ForegroundColor Green $i = 0 $timeTaken = measure-command{ foreach ($blob in $blobs) { $i++ Write-Progress -Activity:"Copying..." -Status:"Copied $i of $($blobs.Count) : $($percentComplete)%" -PercentComplete:$percentComplete $copyInfo = Start-AzureStorageBlobCopy -ICloudBlob $blob.ICloudBlob -Context $srcContext -DestContainer $DestContainer -DestContext $destContext -Force Write-host (get-date) $copyInfo.name } } write-host Write-host "Total Time $timeTaken" -BackgroundColor Black -ForegroundColor Green } Start-AzureStorageBlobContainerCopy -srcStorageAccountName "<src Storage>" -SrcStorageAccountKey "<src key>" -SrcContainer "<src Container>" ` -destStorageAccountName "<dest Storage>" -DestStorageAccountKey "<dest key>" -DestContainer "<dest Container>" ` -filter "<filter>"

THIS POSTING AND CODE RELATED TO IT ARE PROVIDED “AS IS” AND INFERS NO WARRANTIES OR RIGHTS, USE AT YOUR OWN RISK

Switching garbage collection on in an Azure Worker role

Whilst working on an issue with Microsoft on one of our production environments, we came across the fact that an Azure Worker role, by default, has it’s garbage collection set to workstation and not server mode.  If you are using medium or larger (hence multi-processor) you could see a performance benefit by switching to server mode.

Unfortunately the Azure tooling does not currently allow you to directly configure this setting, so you have to do it in a round about fashion, by creating a startup task that will perform changes as the instance boots.

First define a start up task in the Service Definition of your worker role:

<WorkerRole name="WorkerRole1" vmsize="Medium"> <Startup> <Task commandLine="startup.cmd" executionContext="elevated" taskType="simple" /> </Startup>

Now create a “startup.cmd” in the root of your worker that will be used to kick off the powershell that will modify the config file

@echo off powershell -command "Set-ExecutionPolicy RemoteSigned" powershell .\setServerGC.ps1 2>> err.out

And finally create the “setServerGC.ps1” file in the root of your worker role, this is the file that will actually make the modifications.

# Load up the XML $configFile = "${env:RoleRoot}\base\x64\WaWorkerHost.exe.config" [xml]$waXML = Get-Content $configFile if (($waXML.configuration.runtime.gcServer -eq $null) -and ($waXML.configuration.runtime.gcConcurrent -eq $null)) { # Modify XML $gcServerEl = $waXML.CreateElement('gcServer') $gcConcurrentrEl = $waXML.CreateElement('gcConcurrent') $gcServerAtt = $waXML.CreateAttribute("enabled") $gcServerAtt.Value = "true" $gcConcurrentrAtt = $waXML.CreateAttribute("enabled") $gcConcurrentrAtt.Value = "true" $gcServerEl.Attributes.Append($gcServerAtt) | Out-Null $gcConcurrentrEl.Attributes.Append($gcConcurrentrAtt) | Out-Null $waXML.configuration.runtime.appendChild($gcServerEl) | Out-Null $waXML.configuration.runtime.appendChild($gcConcurrentrEl) | Out-Null $waXML.Save($configFile) # Restart WaWorkerHost.Exe Get-Process | ? {$_.name -match "WaHostBootstrapper"} | Stop-Process -Force Get-Process | ? {$_.name -match "WaWorkerHost"} | Stop-Process -Force }

We saw a significant performance boost on the role we deployed this on, but your mileage will vary depending on your workload.

THIS POSTING AND CODE RELATED TO IT ARE PROVIDED “AS IS” AND INFERS NO WARRANTIES OR RIGHTS, USE AT YOUR OWN RISK

Simple Azure Storage Queue Monitor

If you need to monitor the length of a queue in Azure, you can use the Azure PowerShell CmdLets to help you out.

Below is a sample ticker script that uses the Azure CmdLets (so make sure you have them installed) and it polls the configured queue every 10 seconds. 

clear Import-Module Azure $cert = Get-Item cert:\currentuser\my\<cert thumprint> # management cert $subID = "<subscription ID>" # Subcription ID $storageAccount = "<storage account>" # storage account where queue lives $queueName = "<queueName>" # Queue you're interested in $interval = 10 # Time between ticks Set-AzureSubscription -SubscriptionID $subid -Certificate $cert ` -SubscriptionName "CurrentSubscription" ` -CurrentStorageAccount $storageAccount Select-AzureSubscription -SubscriptionName "CurrentSubscription" # do forever loop do { # measure how long it takes to run the command $timeTaken = Measure-Command{ # get the queue info $queueInfo1 = Get-AzureStorageQueue -Name $queueName # write it to screen Write-Host (Get-Date) $queueInfo1.ApproximateMessageCount } # take the time take to run command off the interval time $totalTimeToWait = New-TimeSpan -Seconds $interval $timeToWait = $totalTimeToWait - $timeTaken # go to sleep sleep ($timeToWait.TotalSeconds) }while($true)