Leverage private packages when training and deploying models
This is part of a related set of posts describing challenges I have encountered and things I have learned on my MLOps journey on Azure. The first post is at: My MLOps Journey so far.
The Python scripts we use to submit experiments and deploy models on Azure Machine Learning (AML) Studio rely on a custom code base. This code base depends on a few other packages that we have available from a private Package Feed on Azure DevOps. In the following, I will describe my path to accessing the dependencies in our private Package Feed.
Stumbling blocks when trying to leverage packages from private Package Feed
Dependencies ignored when specified in Environment instantiation
According to the documentation “Add packages to an environment”, dependencies can be specified in a PythonSection object. This PythonSection object should then be given as input when instantiating the Environment object. However, I could not get this approach to work.
What did work in the end was to define a CondaDependencies object and set the .conda_dependencies field equal to this after instantiating the Environment object.
Connecting to private package feed does not work as specified in tutorial
According to documentation section “Use a repository of packages from Azure DevOps feed”, the set_connection method on the Workspace object should enable a connection to our private Package Feed in Azure Artifacts. I was not able to get this approach to work.
I made the connection to the private Package Feed work in the end by injecting a Personal Access Token (PAT) into the Package Feed URL. This URL, containing the PAT, was then passed as extra-index-url to pip and set on the CondaDependencies object. The PAT is stored in Azure Keyvault and passed to the script via an environment variable from the pipeline in Azure DevOps that starts the Python script that needs the PAT for creation of the AML environment.
Documentation specifies wrong scope necessary for PAT for feed access
The Azure documentation Use a repository of packages from Azure DevOps feed instructs to generate a Personal Access Token (PAT) with Read access to Packaging. However, this does not work, as seen from this ticket: https://developercommunity.visualstudio.com/content/problem/1163918/consume-private-package-from-azureml.html. To get the approach described at the documentation page Use private Python packages with Azure Machine Learning to work, the PAT access must be set to ‘all’.
Ensure code updates in private package are reflected in environment
To avoid building new Docker images for environments unnecessarily, a new image is not built if all package versions are the same as in an existing environment. To change the private package functionality in an environment, the version of the private package must be updated after making the desired code changes before rebuilding the environment.
The working script to create environment
Putting the above insights together, I wrote the following Python script which successfully creates the environment containing my private package with dependencies on other private packages available from our Feed in Azure DevOps (in the Artifacts tab).
import os
from glob import globfrom azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.conda_dependencies import CondaDependencies
from azureml.core.environment import Environment, DockerSection
from azureml.core import Workspace
from azureml.core.container_registry import ContainerRegistryfrom <packageName>.utils.default_paths import get_path_to_project_directorydef create_aml_environment(ws):
dependency_file_path = os.path.join(get_path_to_project_directory(), 'azure_ml_scripts', 'conda_env_python37.yml')
conda_dep = CondaDependencies(conda_dependencies_file_path=dependency_file_path)
auth_url = "https://" + os.environ['FeedAccess'] + \
"@<companyName>.pkgs.visualstudio.com/<projectName>/_packaging/<packageFeedName>/pypi/simple"
conda_dep.set_pip_option("--extra-index-url " + auth_url)dist_path = os.path.join(get_path_to_project_directory(), 'dist')
whl_filepaths = glob(os.path.join(dist_path, '<packageName>*.whl'))
whl_filepaths.sort()
whl_url = Environment.add_private_pip_wheel(
workspace=ws,
file_path=whl_filepaths[-1],
exist_ok=True
)
conda_dep.add_pip_package(whl_url)container_registry_def = ContainerRegistry()
container_registry_def.address = "<myContainerRegistryName>.azurecr.io"
container_registry_def.username = os.environ['SPN_ID']
container_registry_def.password = os.environ['SPN_PASSWORD']
docker_def = DockerSection(enabled=True, base_image_registry=container_registry_def,
base_image="<myContainerRegistryName>.azurecr.io/<imageName>:<myTag>")aml_env = Environment(name=os.environ['AMLEnvironmentName'], docker=docker_def,
inferencing_stack_version='latest')
aml_env.python.conda_dependencies = conda_depreturn aml_envdef get_workspace():
workspace_name = os.environ['AML_WORKSPACE_NAME']
resource_group = os.environ['AMLResourceGroup']
subscription_id = os.environ['SUBSCRIPTION_ID']svc_pr = ServicePrincipalAuthentication(
tenant_id=os.environ['TENANT_ID'],
service_principal_id=os.environ['SPN_ID'],
service_principal_password=os.environ['SPN_PASSWORD'])ws = Workspace.get(subscription_id=subscription_id,
resource_group=resource_group,
name=workspace_name,
auth=svc_pr)
return wsdef main():
ws = get_workspace()
aml_env = create_aml_environment(ws)
registered_env = aml_env.register(workspace=ws)
print("Created environment with name " + registered_env.name + " and version " + str(registered_env.version))
print("Registered environment")if __name__ == '__main__':
main()
I use a Pipeline to call the above script. This pipeline builds the .whl file so that it is available for the above script.
Summary
I have described the main obstacles I encountered in using code from a private package with other private packages as dependencies.
I would love to hear from you, especially if there is some of this you disagree with, would like to add to, or have a better solution for.