Deploying Models to Azure ML with Weights & Biases
Created on February 20|Last edited on March 21
Comment
What is Azure ML?
Azure Machine Learning (Azure ML) serves as a comprehensive cloud service designed to accelerate and enhance the machine learning project lifecycle for ML professionals, data scientists, and engineers. The platform helps with multiple stages of the ML workflow, including model training, deployment, and management, supporting a seamless integration with machine learning operations (MLOps).
Azure ML is compatible with a variety of open-source platforms like PyTorch, TensorFlow, and scikit-learn, allowing users to either develop new models or leverage existing ones within its ecosystem. Its MLOps tools are engineered to monitor, retrain, and redeploy models effectively, making it a robust environment for managing the entire lifecycle of machine learning models.

In terms of deployment, Azure ML offers extensive capabilities designed to ease the transition from model training to production. It provides managed endpoints for both real-time (online) and batch scoring (inferencing), enabling scalable model deployment without the need for extensive infrastructure management.
Azure ML’s managed services make the deployment process more efficient, abstracting the underlying infrastructure and making model operationalization more straightforward.
This integration capability makes Azure ML an ideal platform for setting up automatic connections with tools like Weights & Biases, enhancing productivity and enabling continuous delivery and monitoring of ML models in a secure and scalable manner. W&B Launch simplifies the scaling of training runs and the deployment of models, facilitating operations from desktop environments to compute resources like Azure ML.
Weights & Biases Launch with Azure ML

W&B Launch helps automate ML workflows and is structured around three primary components: launch jobs, queues, and agents, creating a robust framework for managing ML workflows.
Launch jobs serve as configurable blueprints for tasks, which are then placed into FIFO launch queues designated for specific compute resources, such as Azure ML environments.
Launch agents, deployed on the user’s infrastructure, periodically poll these queues and execute the jobs on the specified target resource.
This mechanism allows for the automated scaling of ML model deployments, making it particularly effective for practitioners looking to operationalize models within Azure ML’s ecosystem.
By configuring launch queues and agents according to Azure ML specifications, practitioners can automate the deployment of various model types directly into production environments, thereby reducing manual overhead and streamlining the end-to-end ML lifecycle. This integration between W&B Launch and Azure ML enables seamless transitions from model training to deployment, enhancing the efficiency of bringing ML models into production while maintaining the flexibility and control required for high-stakes, enterprise-level machine learning operations.
Azure ML Online Endpoints Launch Job

The W&B launch job specifically addresses the deployment of models from W&B Artifacts to AzureML Online Endpoints.
This job simplifies the process of taking a model artifact within W&B and deploying it directly to an AzureML Online Endpoint. It automatically identifies supported model types contained within the artifact, and creates the necessary deployment files, such as main.py or score.py.
This procedure enables the creation and activation of the endpoint and deployment, while incorporating logging capabilities that record each request made to the endpoint back into W&B. This logging includes details on inputs, outputs, and any encountered error messages, thereby improving traceability and debugging capabilities.
To execute this deployment, the job requires setting up a container environment, where Azure credentials are provided as environment variables. This setup ensures that the deployment container has access to the required Azure resources and configurations.
The job also employs ManagedIdentityCredential for authentication with AzureML, which demands read access to secrets within a specified Azure Key Vault for successful deployment. Supported model frameworks include TensorFlow, PyTorch, and ONNX, with specific requirements for each type regarding the format and structure of the input data. This approach ensures that a variety of model types can be seamlessly transitioned from W&B to operational status within AzureML, maintaining a consistent and efficient deployment pipeline for machine learning models.
How does the Launch job work?
The job begins by setting up a configuration environment, where Azure and W&B specific parameters are defined, including subscription and resource group details, as well as the model artifact path and deployment settings.
It then proceeds to infer the type and name of the model by examining the files within the W&B Artifact. This involves identifying model files based on their extensions and formats for supported types such as PyTorch, TensorFlow, and ONNX.
The job generates a Conda YAML file based on the inferred model type to ensure the deployment environment has the necessary dependencies installed.
A main.py (or score.py) file gets dynamically generated to handle model loading, inference, and logging. This script is tailored to the specific model type and includes code for initializing W&B logging, processing input data, performing inference, and logging the output.
The script also incorporates error handling and logging mechanisms to track the inputs, outputs, and any errors that occur during the inference process.
With the environment and main script prepared, the job configures and creates an AzureML Managed Online Endpoint, setting authentication modes and tags as defined in the configuration.
A deployment is then created under this endpoint, specifying instance types, counts, and linking the prepared environment and code.
Throughout the process, logging is extensively used to provide transparency and traceability, recording each significant action and decision.
Finally, the deployment is executed, and the W&B run is concluded, marking the completion of the deployment job.
Automatically Deploying ML Models to Azure Online Endpoints
Let's walk through a quick example to see how it all works in action. We'll focus on comparing two sentences to determine paraphrasing similarity, specifically targeting the GLUE benchmark's Microsoft Research Paraphrase Corpus (MRPC) task.
Training our model
By using the transformers library, alongside the report_to='wandb' integration, we automatically set up the training, logging, and checkpointing of our model dedicated to this specific task.
During the training phase, the integration provides a real-time, interactive platform to monitor various performance metrics. This includes but is not limited to, loss curves, accuracy measurements, and other relevant metrics such as F1 scores, which are particularly important for the MRPC task. The ability to log these results continuously allows for a transparent and comprehensive view of the model's performance throughout its training lifecycle.
Charts
Run set
1
Save the final trained model in a compatible format
The compatible format files for automatic deployment to Azure Online Endpoints via the W&B Launch Job are:
torch -- .pt, .pthtf -- .savedmodel .pbonnx -- .onnx
The transformers library can convert trained models into these formats, thereby streamlining the process of preparing models for deployment. For instance, after fine-tuning a model on a specific task, such as the sentence paraphrasing similarity task from above, one can export the model to the ONNX format, which is widely recognized for its interoperability and efficiency in serving models.
Once the model is exported, it can be uploaded to W&B Artifacts, providing a centralized and version-controlled repository for storing model files. This ensures that the model can be easily retrieved and deployed using the WANDB_ARTIFACT_PATH configuration parameter in the W&B Launch Job.
Artifacts in W&B not only allow for the storage of models but also promote the tracking of model lineage and metadata, enhancing reproducibility and collaboration among team members. Below is one such example of an Artifact with a trained model in the .onnx format.
onnx_model
Direct lineage view
Once saved, this .onnx model becomes part of the project’s lineage within W&B, which can be viewed under the Files tab in the W&B interface. Furthermore, the Lineage tab provides invaluable insights into the model's lifecycle, showing the specific training run that generated the model. This feature extends visibility into how the model was created, ensuring that all related metrics, code, and datasets are easily accessible for review and audit purposes.
Moreover, after executing the Launch Job, the deployment history of the model is traceable within the same Lineage tab. This allows users to see not only the origin of the model but also the various environments where it has been deployed, enhancing the understanding of the model’s performance and application in real-world post-production scenarios.
This level of traceability and management is crucial for maintaining the integrity of machine learning workflows and ensuring that models are deployed in a controlled and transparent manner.
Link Model to Registry (Optional but Recommended)

The Weights & Biases Model Registry offers a mechanism for managing machine learning models, providing an organized namespace and enhanced control features for the lifecycle management of ML models. By saving models to the W&B Model Registry, users can leverage a structured approach to versioning, tracking, and controlling access to their models.
The storage of models into the W&B Model Registry enables automation triggers, a feature that connects to continuous integration and continuous deployment (CI/CD) workflows found in standard software workflows. These automation triggers can be configured to perform specific actions based on predefined conditions or events. For instance, they can be set to automatically execute W&B Launch jobs, such as deploying models to Azure ML Online Endpoints, whenever a new model version is marked as production or meets certain quality metrics.
Instead of executing launch commands manually each time a model is updated or a new version is created, the system automatically initiates the deployment process once the conditions defined by the automation triggers are met. This ensures that the most up-to-date models marked for production are deployed efficiently and consistently, without direct human intervention, thus accelerating the pace of model updates and deployments while minimizing the potential for human error.
Running the Launch Job
Running the launch job is straightforward once the proper Azure credentials have been set:
# After running git clone https://github.com/wandb/launch-jobs.gitdocker buildx build -t $WANDB_NAME jobs/deploy_to_azuremldocker run \-e WANDB_API_KEY=$WANDB_API_KEY \-e WANDB_ENTITY=$WANDB_ENTITY \-e WANDB_PROJECT=$WANDB_PROJECT \-e WANDB_NAME=$WANDB_NAME \-e AZURE_CLIENT_ID=$AZURE_CLIENT_ID \-e AZURE_CLIENT_SECRET=$AZURE_CLIENT_SECRET \-e AZURE_TENANT_ID=$AZURE_TENANT_ID \--rm --net=host \# if you're running from scratch, you may need to pass these env vars (which correspond to the config):-e AZURE_SUBSCRIPTION_ID=$AZURE_SUBSCRIPTION_ID \-e AZURE_RESOURCE_GROUP=$AZURE_RESOURCE_GROUP \-e AZURE_WORKSPACE=$AZURE_WORKSPACE \-e AZURE_KEYVAULT_NAME=$AZURE_KEYVAULT_NAME \-e AZURE_ENDPOINT_NAME=$AZURE_ENDPOINT_NAME \-e AZURE_DEPLOYMENT_NAME=$AZURE_DEPLOYMENT_NAME \-e WANDB_ARTIFACT_PATH=$WANDB_ARTIFACT_PATH \$WANDB_NAME
Here, the key flag is -e WANDB_ARTIFACT_PATH=$WANDB_ARTIFACT_PATH. The path here would point to the model that is being deployed, a-sh0ts/azure_ml_test/onnx_model:v0 in our example from above.
As the launch job runs verbose logs for the process will be printed to stdout and saved to the azure-job run within the WANDB_PROJECT

After running the launch job the deployment and endpoint service created within both the Azure platform portal and the Azure ML workspace should be visible.


Investigate the Logged Outputs of our ML Deployments
Once deployed, querying the endpoint is as simple as connecting to the resource and passing in the proper parameters:
from azure.ai.ml import MLClientfrom azure.identity import DefaultAzureCredentialimport jsonimport numpy as npsubscription_id = "..."resource_group = "..."workspace = "..."endpoint_name = "..."deployment_name = "..."ml_client = MLClient(DefaultAzureCredential(), subscription_id, resource_group, workspace)def numpify(t: torch.Tensor) -> np.ndarray:"""Tensorflow and ONNX models expect NHWC format, but PyTorch uses NCHW.This function converts a PyTorch tensor to Numpy, and transposes the axes"""a = t.numpy()return np.transpose(a, (0, 2, 3, 1))t = torch.randn(1, 3, 224, 224) # Replace with your model input shapet = numpify(t)with open("sample-request.json", "w") as f:json.dump({"data": t.tolist()}, f)ml_client.online_endpoints.invoke(endpoint_name=endpoint_name,deployment_name=deployment_name,request_file="sample-request.json",)
All requests should be saved and viewable within the tables saved within the WANDB_PROJECT .
TODO: Show tables
Add a comment