How to use secrets without relying on .config files
This is part of a related set of posts describing challenges I have encountered and things I have learned on my MLOps journey on Azure. The first post is at: My MLOps Journey so far.
Naming of secrets
I usually prefer the Python approach to variable names, using underscores to separate words in a name like
my_variable
However, for secret names I have realized that it is easiest to use camel case, for example:
myVariable
since different services have different restrictions on allowed variables names.
Restrictions from Azure DevOps Pipelines
Non-secret variables in an Azure DevOps pipeline are automatically available as environment variables in a Python script run from the command line. For these non-secret environment variables, there is an automatic change of variable name, changing all letters to uppercase and replacing dots with underscores. If environment variable names have hypens in their names, the pipeline breaks and similarly for variables read from a KeyVault. The issue with using variables with a hyphen in their name from a KeyVault is also described here.
To summarize, we can use uppercase, lowercase, and underscores in environment variable names in an Azure DevOps Pipeline.
Restrictions from Azure KeyVault
In Azure Keyvault, alphanumerics and hypens (-) are allowed (see Use valid Key Vault Secret names). Note that underscores are not permitted.
Thoughts on selecting variable names
When choosing variable names, I like them to be descriptive so that I can remember what each of my variables holds. This is usually difficult to do with a single word, so most of my variable names will be a few words long. Separating words in some way makes it much easier to read the variable names. For example, it takes a bit longer to process myvariablename or MYVARIABLENAME than myVariableName or my_variable_name or my-variable-name.
Keeping the same variable name for the same variable across different places (e.g. KeyVault, Pipelines, and in Python scripts) makes it easier to keep track of the variables in use and track the original of a variable value.
Ideally, it would be possible to have a way to separate words in a name as well as keeping the same name in different services. However, as we saw, underscores cannot be used to separate words across locations since they are not allowed in KeyVault secret names. On the other hand, hyphens cannot be used since they are not allowed in Pipelines. That leaves us with the camelCase option.
For secret variables, the camelCase naming convention does not increase verbosity of a Pipeline definition since each secret variable needs to be passed explicitly to each task that needs it anyway. However, for non-secret variables, it would be convenient to leverage the automatic availability of Pipeline environment variables in Python scripts run from the pipeline. Unfortunately, as described above, the environment variables passed automatically are changed to all upper case. So either we have to explicitly define non-secret variables we need for each Pipeline task and specify the desired name, as for the secret variables. Or we have to keep in mind to access the non-secret variables with all upper case names from Python scripts. If I only have a few non-secret variables, I prefer the first option of explicitly defining the non-secret variables as well as the secret ones.
Summary
The camelCase naming convention for secrets and environment variables seems like the easiest way toward naming consistency across Azure services. Determining a naming convention as well as frequent abbreviations before starting a project will decrease the number of headaches later on when trying to find out which variable represents a specific secret or value.
Using variables in Azure DevOps Pipelines
There are a number of ways to use and reasons for using variables in Azure DevOps Pipelines. One setting in which variables are useful is for testing code on different systems, as described in how to Create a multi-platform pipeline. This approach can also be used to test code on e.g. different Python versions as described in Build Python apps. In the following, I will describe some of the different variable options available in Pipelines.
Secret vs non-secret variables
I have used both secret and non-secret variables for my MLOps Pipelines. Some uses of non-secret variables have been to define the name of my workspace and experiment in one place. This approach made it easy to make sure that all scripts use the same experiment and workspace without having to check name consistency manually. It also makes it easy if I want to switch to another workspace or experiment at some point.
I have used secret variables for secrets like passwords to leverage the higher level of protection of the values in secret variables. For example, secret variables are printed as **** in logs whereas non-secret variables are not. As discussed above, secret variables are not automatically made available in scripts run from the Pipeline as is the case for non-secret variables, but need to be made available explicitly. The “env” section of a step in a Pipeline allows this as described in Set secret variables.
I have also used the option to link directly with an Azure KeyVault to access a collection of secrets pertaining to a collection of resources used together. Valuesfrom an Azure KeyVault are automatically treated as secrets. The first time a pipeline is run after adding a variable group from a KeyVault to the Pipeline definition, permission for the pipeline to access the KeyVault needs to be given manually in the Pipeline overview.
Single vs. groups of variables
Variables can be defined both individually and in groups, as in shown here:
name: 'register_model_if_better_than_current_pipeline'
jobs:
- job: 'model_registration_job'
pool:
vmImage: 'ubuntu-18.04'
variables:
- group: private-packages
- group: inflow-prediction-key-vault
- group: inflow-prediction-dev-env-vars
- name: PIP_CACHE_DIR
value: $(Pipeline.Workspace)/.pip
Using variable groups makes it easy to leverage multiple variables without having to tediously type in values. Variable groups also ensure that the same variable values are used across pipelines, avoiding typos in variable values in individual pipelines.
Variable groups can be created in Azure DevOps under the Pipelines menu, in the Library. The view seen from the Library is shown below, where I have crossed out the names and dates of my variable groups. A new variable group is created by clicking the “+ Variable group” button.
Variable groups can be specified as being secret in the Azure Portal by clicking the pad lock icon circled in green below. A variable group can also be linked to an Azure Key Vault by toggling the button circled in purple below.
Pass secrets from Azure DevOps Pipeline to Python script
As mentioned above in the “Thoughts on selecting variable names” section, non-secret variables are available as environment variables to Python scripts started from a pipeline. Secret variables, on the other hand, need to be passed explicitly like this:
- script: |
source env/bin/activate
python my_python_script.py
displayName: 'My Python Script'
env:
SPN_PASSWORD: $(spn-password-variable-name)
SPN_ID: $(spn-id-variable-name)
TENANT_ID: $(tenant-id-variable-name)
Access secrets from scoring script used by endpoint in Azure ML Studio
When using a pipeline to deploy an endpoint in Azure ML Studio, a Python script that defines the deployment needs to be called. This Python script, in turn, needs to specify another Python script that defines how the endpoint should respond when it receives a request. It took me a while to figure out how to make secrets and environment variables available to the scoring script. In the end, the solution I found was to define a dictionary with the variables I want to make available as environment variables to the scoring script, and set this dictionary in the attribute .environment_variables of the Environment object passed to the InferenceConfig object.
Summary
I have described my experience and learnings using secrets to communicate with Azure services from Azure DevOps Pipelines and endpoints in the Azure ML Studio.
I would love to hear from you, especially if there is some of this you disagree with, would like to add to, or have a better solution for.