Use deep learning model as preprocessing step
This is part of a related set of posts describing challenges I have encountered and things I have learned on my MLOps journey on Azure. The first post is at: My MLOps Journey so far.
In the project I worked on, we had image data available to help predict a numeric quantity. In initial phases of the project, we worked on predicting the quantity of interest using a deep learning model with image data (in addition to tabular data) as input. However, we did not have success with this approach, most likely due to scarcity of data during events of interest where the quantity of interest is substantially higher than at all other times.
Instead, we switched to a composite ML model approach. In this approach, we use a deep learning model with the image data as input to predict one of four categories. This prediction is then provided as input to a LightGBM model along with the other available tabular data.
The predictions from the deep learning model need to be available as quickly as possible so the deployed LightGBM boosting model can access the most recent data when predictions are requested. To be able to retrain the LightGBM model, predictions from the deep learning also need to be stored.
To make sure predictions from the deep learning model are available as quickly as possible and subsequently stored, we use a Function App that is triggered by the creation of a blob in the preprocessed image data storage. This Function App calls a REST Endpoint defined in the Azure Machine Learning Studio, which provides predictions from the deep learning model. The Function App parses the response from the REST Endpoint and inserts the predictions in a table in the Azure SQL Database that also holds tables for the other tabular data. In the following, I will explain how the REST Endpoint was set up in more detail and how the Function App was defined.
REST Endpoint for predictions from deep learning model
In this project, the deep learning model work was performed before we started leveraging MLOps. During development, we used data from one geographical location to identify the optimal cost function, model architecture, and hyperparameter ranges. Models were subsequently trained for this and ten more locations. These eleven models needed to be automatically called to maintain an updated database of predictions, as explained above. The REST Endpoint was set up using the Python SDK for the Azure ML Studio with the following code for the deployment script, which is called from a local computer on which a .config file with secrets is located:
from azureml.core.environment import Environment
from azureml.core.model import InferenceConfig, Model
from azureml.core import Workspace
from azureml.core.authentication import ServicePrincipalAuthentication
from azureml.core.webservice import AciWebservice
import os
import configparserfrom radar_to_flow.utils.default_paths import get_config_file_path
from radar_to_flow.utils.cloud_development_helpers import set_env_var_local_modedef main():
credentials_config_file_path = os.path.join(get_config_file_path())
credentials_config = configparser.ConfigParser()
credentials_config.read(credentials_config_file_path)
table_store_config = credentials_config['AzureNowcastTableStorage']svc_pr = ServicePrincipalAuthentication(
tenant_id=os.environ['tenant_id'],
service_principal_id=os.environ['SPN_ID'],
service_principal_password=os.environ['SPN_PASSWORD'])ws = Workspace.get(subscription_id=os.environ['subscription_id'],
resource_group=os.environ['resource_group'],
name=os.environ['AML_WORKSPACE_NAME'],
auth=svc_pr)myenv = Environment.get(workspace=ws, name='development_r2bin_aml_env')
myenv.environment_variables = {'AzureNowcastTableStorageConnectionString': table_store_config['connection_string']}deployment_config = AciWebservice.deploy_configuration(cpu_cores=3, memory_gb=10, auth_enabled=True)
inference_config = InferenceConfig(entry_script="score_r2bin.py", environment=myenv,
source_directory='supporting_files')
service = Model.deploy(
ws, name="r2bin-all-gauges-batch", models=[], inference_config=inference_config,
deployment_config=deployment_config, overwrite=True)service.wait_for_deployment(show_output=True)
print(service.state)
print(service.get_logs())if __name__ == '__main__':
set_env_var_local_mode()
main()
The two custom functions, `get_config_file_path` and `set_env_var_local_mode` do what their names imply, i.e. `get_config_file_path` gets the absolute path to the local .config file with secrets and `set_env_var_local_mode` sets the necessary environment variables. My source directory contains the deep learning model objects and the scoring script score_r2bin.py:
My scoring file is very customized, but general advice on writing this can be found at e.g. Advanced entry script authoring.
A scoring script needs to have two functions defined:
- init()
This is called when the Docker image that will run the inference code is built. I load the models here so that time does not need to be spent on loading models each time the endpoint is called. - run(input)
This is the function that is called when the endpoint is called. The input is what is sent to the endpoint when it is called called. This can e.g. be data to be passed to the model, or something else defining the desired prediction. In my case, the user does not have access to the input data, so I have defined the necessary input to be the date for which a prediction is wanted.
I parse the input data to get the desired prediction date like this:
data = ast.literal_eval(data)
if not valid_input(data):
raise InputNotValid(
"Input to service can either specify a single date to predict for (e.g. {'date': '15-12-2020 15:00'}) "
"or a closed interval to predict for (e.g. {'from': '15-12-2020 15:00', 'to': '15-12-2020 16:00'}), "
"but was " + str(data))
My data validation function looks like this:
def valid_input(input_dict):
if len(input_dict) > 2:
return False
if len(input_dict) == 0:
return False
if len(input_dict) == 1:
if 'date' not in input_dict.keys():
return False
if len(input_dict) == 2:
if 'from' not in input_dict.keys():
return False
if 'to' not in input_dict.keys():
return False
return True
After having parsed the input to get the date, I query my Table Storage to retrieve the appropriate preprocessed image data. I then pass this to each of the deep learning models and collect the predictions in a DataFrame. I return the DataFrame with predictions like this:
return json.dumps(all_predictions.to_json())
I put a try/except block around all the code to make sure I can return information on any unexpected Exceptions to the user, like this:
Function App that retrieves and stores deep learning predictions
To make sure that predictions are made as soon as new preprocessed image data is available, I have set up a Function App that is triggered when new data arrives in the appropriate blob storage. If I were to do this again, I would look into having the Function App be able to predict using the deep learning models, without having to call another service.
Before code can be pushed to build a Function App, an empty Function App needs to be defined. One way to do this is through the Azure Portal. There are different ways to create new services, one way is to search for the desired service, like this:
Click the Function App result, and you will be taken to this page, where you click create:
After clicking create, you need to go through some steps to define the new Function App. To define a Function App with Python code create it like this (filling in a value for Resource Group of course):
The name of the function app must be unique.
To enable the Function App to be triggered, make sure that the “type” field in the function.json file is defined as “eventGridTrigger”. My function.json file is defined like this:
{
"scriptFile": "__init__.py",
"bindings": [
{
"type": "eventGridTrigger",
"name": "event",
"direction": "in"
}
]
}
Triggering the Function App
To trigger the Function App when a new blob is created, go to the storage you want to monitor. Then, on the left, click the Events tab:
Then click on + Event Subscription:
Now fill in the various fields appropriately. The “Name” field in the Event Subscription Details is something you choose to name this new Event Subscription that you are creating. In the Event Types section, you choose which storage events you want to trigger the Function App. Finally, you specify which item to trigger, in our case a Function App. After choosing the subscription that you created the Azure Function under, its resource group, and the appropriate slot, you should see it appear in the Function drop down menu, as shown here:
After clicking Confirm Selection, the Function App should be triggered when one of your selected events occurs in the Blob Storage.
For some reason, I was not able to get the above approach of selecting Function App as Endpoint Type to work. What did work in the end was choosing WebHook as Endpoint Type instead of Azure Function. The URL to give in the WebHook to make the Azure Function fire is: https://myuniquefunctionapp.azurewebsites.net/runtime/webhooks/EventGrid, where the first part of the URL is the name you selected for Function App, written in lower case letters.
Summary
My main lessons from this are:
- Use Function App to call ML/deep learning model directly without using Azure ML Studio REST Endpoint if the model does not need to be available for other purposes than that of the Function App.
- Use WebHook as Endpoint Type to successfully set up Event Subscription to trigger Function App upon events in Blob Storage. I would be very happy if anyone knows what might have gone wrong such that choosing Azure Function as Endpoint Type did not work.
I would love to hear from you, especially if there is some of this you disagree with, would like to add to, or have a better solution for.