GLOBAL BEST PRACTICES FOR AZURE DATA FACTORY IMPLEMENTATION DUBAI– AUTO CHECKER SCRIPT V0.1
Building on the work done and detailed in my previous blog post (Best Practices for Implementing Azure Data Factory Dubai) I was tasked by my delightful boss to turn this content into a simple check list of what/why that others could use…. I slightly reluctantly did so. However, I wanted to do something better than simply transcribe the previous blog post into a check list. I therefore decided to breakout the Shell of Power and attempt to automate said check list.
Sure, a check list could be picked up and used by anyone – with answers manually provided by the person doing the inspection of a given ADF resource. But what if there was a way to have the results given to you a plate and inferring things that aren’t always easy to spot via the Data Factory UI.
Supported by friends in the community I had a poke around looking at ways such a PowerShell script could do this for Data Factory, also thinking about a way to objectify an entire ADF instance. Sadly, hitting an existing ADF instance, deployed in Azure wasn’t going to be an option:
- The Get-FactoryV2XXX cmdlets aren’t rich enough in terms of there outputs for the levels of checks I wanted to perform.
- Permissions to production resources can often be an issue, especially if wanting to run random bits of PowerShell against a business critical environment.
- The checks needed to be done offline with the feedback given to a human to inform next steps. This is not intended to be a hard pass/fail test.
- Its not always a given that a Data Factory instance will be connected to a source controlled repository.
- Downloading a typical Az Resource template for an existing Data Factory isn’t yet supported using Get-AzResource.
- Via the Azure Portal UI, if viewing a Resource Group and trying to get an automation generated ARM template, Data Factory doesn’t support this either, seen below.
With the above frustrations in mind my current approach is to use the ARM template that you can manually export from the Data Factory developer UI. This can be downloaded and then used locally with PowerShell.
Once downloaded and unzipped, armed with this JSON file (arm_template.json) I could begin with everything I needed in a single place to query the target Data Factory export.
As a starting point for this script, I’ve created a set of 21 logic tests/checks using PowerShell to return details about the Data Factory ARM template. This includes the following:
- Pipeline(s) without any triggers attached. Directly or indirectly.
- Pipeline(s) with an impossible AND/OR activity execution chain.
- Pipeline(s) without a description value.
- Pipeline(s) not organised into folders.
- Pipeline(s) without annotations.
- Data Flow(s) without a description value.
- Activitie(s) with timeout values still set to the service default value of 7 days.
- Activitie(s) without a description value.
- Activitie(s) ForEach iteration without a batch count value set.
- Activitie(s) ForEach iteration with a batch count size that is less than the service maximum.
- Linked Service(s) not using Azure Key Vault to store credentials.
- Linked Service(s) not used by any other resource.
- Linked Service(s) without a description value.
- Linked Service(s) without annotations.
- Dataset(s) not used by any other resource.
- Dataset(s) without a description value.
- Dataset(s) not organised into folders.
- Dataset(s) without annotations.
- Trigger(s) not used by any other resource.
- Trigger(s) without a description value.
- Trigger(s) without annotations.
Circa 1000+ lines of PowerShell later…
Each check as a severity rating (based on my experience) as in most cases there isn’t anything life threatening here. Its purely informational and based on the assumption that a complete round of integration testing has already been done for your Data Factory.
The intention of the script is to improve on the basics and add quality to a development that goes beyond simple functionality. For example, a Linked Service may operate perfectly. But, if using Key Vault to handle the credentials, then that’s a better practice we should be working towards.
Ok, enough preaching!
Here is an example of the PowerShell script v0.1 output from one of my bosses Data Factory instances, certainly not one of mine!!