Simple Azure VM Start/Stop Chaining using only Tags, Event Grid and Azure Functions

When you are migrating VMs from on-premise to Azure, you always have to evaluate the needed availability of several VMs. Your decisions in terms of VM size, storage tiers, and pricing options do rely on this evaluation. In my current migration of on-prem Remote Desktop Services to Azure Virtual Desktop, we have a Remote App that is used quite irregularly. Sometimes once per week, sometimes one to two days, and sometimes not a single time in a week. So we will go with Pay as you Go for these needed VM’s. We can deal with this behavior easily in Azure Virtual Desktop (planned shutdown and start on connect), but that’s only the frontend VM. In my scenario, I have some additional backend VMs which hold some services needed for the running application (licenseservice and some webservices for the DMS integration). We don’t need to run the backend VMs if nobody uses the frontend application, so I want to link the running state of these VMs with each other. 

The Frontend VM will be triggered by AVD’s “Starts on Connect” feature, and the needed backend server will be automatically started and deallocated depending on the Frontend VM.

I know there are solutions using EventGrid + Logic App + Azure Automation. But as you may already know, serverless Azure Functions are simply more efficient in terms of scaling and pricing.

In my setup how-to, I decided to simplify the setup with two single VMs. It shouldn’t be hard for someone to adjust, because in the end, you only have to tag the dependent VMs with the same value.
So let’s start …

Create some VMs for testing

We created two resource groups for testing. In my example, I created one named “Lab_init” and one “Lab_triggered” ? This way, we can define which VM can trigger the start process by putting them into this resource group.

Now we create 2 VMS, one in our “Lab_init” resource group and one in our “Lab_triggered” resource group.

I’m going with Ubuntu this time, but it doesn’t really matter. We only want to start and stop, so go with whatever you prefer.

Next, we need to tag our VM’s. The value can be whatever we want, but it has to match on all VM’s that we want to trigger. The code of our function (we’ll get to this later) loop through all subscriptions and search for VM’s with the same value in the bootbinding tag.

Setup Azure Function App

Now we get to the funny part.

We will create a new Azure Function App. (Serverless tier is good enough for our needs 🙂 )
Because our Function App needs to start/stop our Azure VMs across multiple subscriptions, we need a Managed Identity.
Add the Virtual Machine Contributor role for every subscription where you place VMs which needs to be triggered.
Our Azure Function App needs some modules to do its job. We have to add these to the requirements.psd1 file.
Note: You shouldn’t add the full Az module, as it’s quite large. Only add the submodules you really need.
Now we create our function and select “Azure Event Grid trigger”!
We enter the following code for our function:
param($eventGridEvent, $TriggerMetadata)

# Make sure to pass hashtables to Out-String so they're logged correctly
# $eventGridEvent | Out-String | Write-Host

$tAction = ($eventGridEvent.data.authorization.action -split "/")[-2]
$tVmName = ($eventGridEvent.data.authorization.scope -split "/")[-1]
$tSubscriptionId = $eventGridEvent.data.subscriptionId

# preflight check
Write-Host "Check trigger action"
if (($tAction -ne "start") -and ($tAction -ne "deallocate")) {
    Write-Warning "Unsupported action: [$tAction], we stop here"
    break
}
Write-Host "##################### Triggerinformation #####################"
Write-Host "Vm: $tVmName"
Write-Host "Action: $tAction"
Write-Host "Subscription: $tSubscriptionId"

Write-Host "Get information about trigger vm"
$context = Set-AzContext -SubscriptionId $tSubscriptionId

if ($context.Subscription.Id -ne $tSubscriptionId) {
    # break if no access
    throw "Azure Function have no access to subscription with id [$tSubscriptionId], check permissions of managed identity"
}

$tVm = Get-AzVM -Name $tVmName
$bindingGroup = $tVm.Tags.bootbinding

if (!$bindingGroup) {
    Write-Warning "No tag with bootbinding found for [$tVmName], check your tagging"
    break
}

# main
Write-Host "Query all subscriptions"
$subscriptions = Get-AzSubscription

foreach ($sub in $subscriptions) {

    Write-Host "Set context to subscription [$($sub.Name)] with id [$($sub.id)]"
    $context = Set-AzContext -SubscriptionId $sub.id

    if (!$context) {

        # break if no access
        Write-Warning "Azure Function have no access to subscription with id [$tSubscriptionId], check permissions of managed identity"
        return
    }

    # get vms with bootbinding tag
    $azVMs = Get-AzVM -Status -ErrorAction SilentlyContinue |  Where-Object { ($_.Tags.bootbinding -eq $bindingGroup) -and ($_.Name -ne $tVmName) }
    if ($azVMs) {
        $azVMs | ForEach-Object {
            Write-Host "VM [$($_.Name)] is in same binding-group, perform needed action "
            $vmSplatt = @{
                Name              = $_.Name
                ResourceGroupName = $_.ResourceGroupName
                NoWait            = $true
            }
            switch ($tAction) {
                start {
                    Write-Host "Start VM"
                    $_.PowerState -ne 'VM running' ? (Start-AzVM @vmSplatt | Out-Null) : (Write-Warning "$($_.Name) is already running")
                }
                deallocate {
                    Write-Host "Stop VM"
                    $_.PowerState -ne 'VM deallocated' ? (Stop-AzVM @vmSplatt -Force | Out-Null) : (Write-Warning "$($_.Name) is already running")
                }
                Default {}
            }
        }
    }
}

Setup event grid

Thankfully, we can use an “Event Grid System Topic” for our solution, so we don’t have to code anything here. You can think of a Topic as the source, where we want to react to events that occur.
Because we want to react to events in our “Lab_init” resource group, we select Resource Groups as Types and select “Lab_init” as the resource group.
If we want to trigger something, we have to create an “Event Subscription”
First, we give our Event Subscription a name and an endpoint. The endpoint defines what we want to trigger.
We dont want to call our function on every event in the dependent resource group, so we make some adjustments to filter for specific events. Otherwise, we have unnecessary function calls and have to filter the event in your function code, which is not good practice if we really don’t need to, because there is no other solution. In the Basic section, we reduce invocations to only successfully completed events.
In the Filter section of our Event Subscription we should also add some string filtering for the subject. This helps us only trigger our function if the event is triggered by the Microsoft.Compute provider on a virtual machine.

Validate Setup

Now let’s test our configuration

We start our “initVM”
In our Topic view, we see that some events are received by our Topic and also that some events are matched by our advanced filter.
Same informations four our “Event Subscription”
And we can also check our function output.

Log into our VMs

Check initVM
Check triggeredVM

As you can see, there is most likely a time difference of 3 minutes between the boottimes, so keep that in mind. In my AVD scenario, it doesn’t really matter, because we have some buffer until the user logs in and starts the application. We never had problems with that.

Hope it can be usefull for somebody, feel free to a adjust

React to MEM Logs using Event Hubs and Azure Functions

Example: Convert User-Driven provisioned Autopilot Devices to Shared Devices

I’ve got an interesting challenge from one of my customers.
Long story short, we have hybrid ad joined devices (for no really good reason, I know 😉 ), but only “User Driven Provisioning” via Windows Autopilot is available (at the moment).
But mobile devices get shared regularly for weekend tasks, the customer wants to allow every user to use the company portal on these devices.
The only way to accomplish this at the moment is to remove the primary user from the device in the MEM Admin Portal, because this will “convert” the device into a shared device.
So he wants me to automate this when a new device gets enrolled in intune via Windows Autopilot.

In the past, I’ve often used Azure Monitor with Alerts and Runbooks to perform tasks like this.
But since I’ve dug a little bit into Azure Functions in my last project, I decided to go with an alternative approach this time. (Because Azure Functions are so damn awesome and a shout-out to Laura Kokkarinen, her blog post helps me a lot: Link )

So lets start, that’s the plan:

  1. Logstreaming MEM operational logs and looking for a specific “event”.
  2. Redirect formatted output of a specific event to a new logstream
  3. The redirected event triggers Azure Functions via binding
  4. Azure Function calls Microsoft Graph with a token from the Managed Identity endpoint (that’s by far the coolest part)

Create the necessary Event Hubs

We create two Event Hubs in our new namespace, one for operational logs and one for the filtered enrollment events (we will get into that later).

Forward MEM Operational Log to Event Hub

Return to the MEM Admin Portal and configure log forwarding.

Note: Now is a good time to enroll a test device, so you have some log entries to play with.

Configure Azure Stream Analytics Job

Let’s take a look at the logstream. We navigate back to our event hub namespace and open our previously selected event hub.

We save the query as a stream analytics job.
We add output to our “Analytics Job”.
Don’t forget to start the job (unfortunately discovered after 2 hours of troubleshooting :-))
Note: Again, now is another good time to enroll a device, so we can validate if entries are received by our event hubs.

Create Azure Function and bind to Event Hub

We now create a new Azure Function App with your favorite runtime stack. I usually go with Powershell and my demo code is also written in Powershell (so if you want simple copy and paste –> select Powershell Core).
The rest of the settings are good by default, and the serverless plan is the most beaty one :-). Application Insights give us a historical view.

Next we create our first function in our new Function App and bind it with our “newenrolleddevice” event hub.
Click create and the portal brings us to our new function, where we go to the “Code + Test” section and enter the following code and click “Save”.

param($eventHubMessages, $TriggerMetadata)

# Write-Host "PowerShell event hub trigger function called for message array: $eventHubMessages"

$eventHubMessages | ForEach-Object {

    # get Intune device id
    $jsonOut = $_ | convertto-json
    Write-Host "Processing event: $jsonOut"

    $deviceID = $_.properties.IntuneDeviceId
    Write-Host "DeviceID: $deviceID"

    try {
        # request accesstoken from managed identity
        Write-Host "Trying to get authentication token from managed identity."
        $authToken = Receive-MyMsiGraphToken

        #Invoke REST call to Graph API
        Write-Host "Call Microsoft Graph to remove primary user from device."
        Remove-MyPrimaryUser -IntuneDeviceID $deviceID -AuthToken $authToken
    }
    catch {
        Write-Error $_
    }
}

As you might see, I use 2 helper functions in this example. “Receive-MyMsiGraphToken” and “Remove-MyPrimaryUser”, we add this function to the “profile.ps1”. The “profile.ps1” file loads every time the function does a cold start.

We append the following code to our “profile.ps1” file.
function Receive-MyMsiGraphToken {
    $Scope = "https://graph.microsoft.com/"
    $tokenAuthUri = $env:IDENTITY_ENDPOINT + "?resource=$Scope&api-version=2019-08-01"

    $splatt = @{
        Method = "Get"
        Uri = $tokenAuthUri
        UseBasicParsing = $true
        Headers = @{
            "X-IDENTITY-HEADER" = "$env:IDENTITY_HEADER"
        }
    }
    $response = Invoke-RestMethod @splatt
    $accessToken = $response.access_token

    if ($accessToken) {
        return $accessToken
    }
    else {
        throw "Could not receive auth token for msgraph, maybe managed Identity is not enabled for this function"
    }
}
function Remove-MyPrimaryUser {
    param (
        $AuthToken,
        $IntuneDeviceID
    )
    $splatt = @{
        Method = "DELETE"
        Uri = "https://graph.microsoft.com/beta/deviceManagement/managedDevices('$IntuneDeviceID')/users/`$ref"
        UseBasicParsing = $true
        ContentType = "application/json"
        # ResponseHeadersVariable = "RES"
        Headers = @{
            'Content-Type'='application/json'
            'Authorization'= 'Bearer ' +  $AuthToken
        }
    }
    $result = (Invoke-RestMethod @splatt).value

    if ([string]::IsNullOrEmpty($result)) {
        return $true
    }
    else {
        throw "Removing primary user from device ('$IntuneDeviceID') failed"
    }
}

Add MS Graph permissions to the Azure Function App

Now we have everything in place, for our final part. We have to add some permissions to our Azure Function App.

We enable the managed identity for our function app and we copy the object ID to our clipboard because we need it in the next step.
# replace with your managed identity object ID
$miObjectID = "place your object id here"

# MS Graph app ID
$appId = "00000003-0000-0000-c000-000000000000"

# replace with the API permissions required by your app
$permissionsToAdd = @(
    "DeviceManagementManagedDevices.ReadWrite.All"
)

Connect-AzureAD

$app = Get-AzureADServicePrincipal -Filter "AppId eq '$appId'"

foreach ($permission in $permissionsToAdd) {
    $role = $app.AppRoles | Where-Object Value -Like $permission | Select-Object -First 1
    New-AzureADServiceAppRoleAssignment -Id $role.Id -ObjectId $miObjectID -PrincipalId $miObjectID -ResourceId $app.ObjectId
}


# Restart app after changing permission

Try it out

We reset and reenroll our test device/VM and take a look. After some time, we should see the message received by our “newenrolleddevice” logstream.

And some “success” messages in our function monitoring

Some final words

  • This is only an example, so please feel free to select different tiers and plans to meet your needs
  • Why redirecting into an new Event Hub? For demonstration purposes only, if you use this method in an environment with thousands of clients, you can easily reduce the number of times your function is invoked.
  • The possibilities are basically endless; tagging based on geolocation, joining groups based on properties, which are not supported at the moment, etc…

about Mr. Chris

Christopher is a running swiss army knife for IT. Since 20 years he works mostly focused on microsoft technologies across the board with some greater sidesteps into networking and opensource configuration management. Currently he mainly works on hybrid environments by implementing various microsoft cloudservices and do some magic with powershell ;-).