Building an MLOps Pipeline Using W&B and UbiOps

In this post we show how to connect UbiOps and Weights and Biases for training and deploying a machine learning model.
Created on June 6|Last edited on June 20
Comment
﻿
IntroductionWe're going to use the UbiOps platform to train three models on the CIFAR-10 dataset. We'll be training them in parallel, in the cloud, using NVIDIA T4 GPUs. 
While the training jobs are running, we'll head over to Weights & Biases to analyze performance metrics during our training runs, and to compare the final models. After checking the accuracy metrics of all three training runs, we will store our best performing ML model on Weights & Biases, and deploy it by turning it into a live and scalable API endpoint on UbiOps. The model can be conveniently exposed to customers via the API endpoint in a production set up, allowing it to scale, depending on the demand. 
Lastly, you can follow along by creating a free tier account via this link, and by navigating to this Google Colab!
﻿
A Brief Intro to UbiOps UbiOps is an MLOps platform that helps you deploy machine learning models and pipelines as scalable inference endpoints. Also, it provides easy access to scalable, on-demand GPU compute. 
Working with UbiOps and Weights & BiasesUbiOps will take your Python code, create a Docker with all necessary dependencies and run the job as a microservice in the cloud on Kubernetes. UbiOps can both do model inference where it creates an auto-scaling inference API endpoint, as well as run model training jobs, where the same code environment can be reused to run multiple training job in parallel, on selected compute resources.
In our training script, we'll load the CIFAR10 dataset, apply some preprocessing and connect to our WandB project. The training job is wrapped in a Weights & Biases run so that we can actively monitor it. We can also monitor the usage of the GPU along the way, download model checkpoints created during the Weights & Biases callbacks, and analyze the performance of our models during runtime! 
So, to run our training scripts, we'll first need to set-up the code environment. This environment includes the Python version we're working with, dependencies that we use in our training code, and pre-installed CUDA drivers. After setting up this environment once, we can reuse it to run different (training jobs) with an identical runtime.
The UbiOps interface of an environment. Environments contain dependencies and files that are required for your scripts to run. The environment can be re-used for different training jobs and deployments. Here, the environment only contains some pip packages.
We'll then create a UbiOps experiment—which segments different training runs—and we select our compute resources to include an NVIDIA T4 GPU card. When we upload a training job, the training code will be run on top of our environment on the selected compute resource. Within this experiment, we can easily try out different training codes, or run the same training code with different hyperparameters. In this example, we will do the latter.
Launching our training jobs on UbiOpsWithin our experiment, we will run three training jobs with different batch sizes, optimizers, and random rotations of our images. We can initiate the runs via the UI, or via a few lines of code: 
data_experiments = [
    {
        "batch_size": 32,
        "nr_epochs": 50,
        "optimizer": "adam",
        "random_rotation": 0.1
    },
    {
        "batch_size": 64,
        "nr_epochs": 50,
        "optimizer": "adam",
        "random_rotation": 0.25
    },
    {
        "batch_size": 16,
        "nr_epochs": 50,
        "optimizer": "sgd",
        "random_rotation": 0.25
    }
]
﻿
for index, data_experiment in enumerate(data_experiments):
     new_run = training_instance.experiment_runs_create(
         project_name=PROJECT_NAME,
         experiment_name=EXPERIMENT_NAME,
         data=ubiops.ExperimentRunCreate(
             name=f"training-run-{index}",
             description=f'Trying out a run with {data_experiment["nr_epochs"]} epochs and batch size {data_experiment["batch_size"]}',
             training_code='training_code/train.py',
             parameters=data_experiment
         )
     )
﻿
We can check the statuses and logs of our training runs in the UI of UbiOps:
﻿
From the experiment interface on UbiOps, we see that two training jobs are running, while a third is still pending. This indicates that the on-demand GPU is still booting up.
Using Weights and Biases, we get a more detailed understanding of the performance of the different runs, during and after training. Note that the GPUs are being used at full capacity!
﻿
From the experiment interface on UbiOps, we can track the metrics of our training runs while they are executing. Additionally, we get insights in metrics that are related to the usage of the GPUs!
Using Weights & Biases to store our best modelUsing this information, we can select the model with the highest final validation accuracy. We use the Weights and Biases API to download the best model from the best run, and to upload this model to a production project in Weights and Biases, which we called ‘production-environment’ here
import wandb
#Download the best model from our best run
best_training_run = " "
tmp_folder = 'artifact_folder'
# Specifying key with api_key argument gives errors sometimes
wandb_api = wandb.Api()
print(f"{WANDB_TRAINING_PROJECT}/model-{best_training_run}:latest")
artifact_obj= wandb_api.artifact(f"{WANDB_TRAINING_PROJECT}/model-{best_training_run}:latest", type = 'model')
artifact_obj.download(tmp_folder)
﻿
#And upload the artifact to our production environment
with wandb.init(project = "production-environment", job_type = "model") as run:
    artifact = wandb.Artifact('production-models', type = 'model')
    artifact.add_dir(f"{tmp_folder}")
    run.log_artifact(artifact)
﻿
Deploying our Model on UbiOpsNext we are going to deploy the model and create an inference endpoint on UbiOps . This is called a 'deployment' in UbiOps and contains the following Python code. The Python code is again executed in an environment with the proper dependencies loaded. We use the initialization function of our deployment to grab our latest model from the production environment and to load it in memory. The request function is used to classify a new input image. The final deployment code looks as follows:
import numpy as np
import tensorflow as tf
import wandb
from imageio.v3 import imread
﻿
﻿
class Deployment:
    def __init__(self):
        print("Initialising deployment")
        # make connection to wand API here and pass reference to load_model
        wandb_api = wandb.Api()
        artifact_obj = wandb_api.artifact('production-environment/production-models:latest')
        artifact_path = 'artifact_folder'
        artifact_obj.download(artifact_path)
        self.model = tf.keras.models.load_model(artifact_path)
        self.cifar_classes  = ("Airplane", "Automobile", "Bird", "Cat", "Deer", "Dog", "Frog", "Horse", "Ship", "Truck")
﻿
    def request(self, data):
        print("Processing request")
        x = imread(data['image'])
        # Check that the image is 32x32 pixels
        assert x.shape == (32, 32, 3)
        # convert to a 4D tensor to feed into our model
        x = x.reshape(1, 32, 32, 3)
        x = x.astype(np.float32) / 255
﻿
        out = self.model.predict(x)
        prediction = self.cifar_classes[int(np.argmax(out))]
﻿
        # here we set our output parameters in the form of a json
        return {'prediction': prediction}
We can upload our deployment files to UbiOps, and specify some scaling and request retention settings along the way. We decide that we run on a CPU instance with 1 GB RAM, and set maximum_instances = 3, so that we can handle peak workloads. Also, we store request in and output, so that we can measure the performance of our model in production.
After uploading our deployment package to UbiOps, we can send requests to the model API. This (blurry) image of a frog is sent to the inference deployment. By looking at the image showing the results, we can see that this image is classified correctly!/
﻿
﻿
ConclusionAnd that's it! We've used the training insights from Weights & Biases, and the compute resources and deployment possibilities from UbiOps to create a training and inferencing pipeline, resulting in a live and scalable model. We can reach our model via its API endpoint, when we provide the correct authentication credentials. 
After setting up the baseline model, you can easily add new deployment versions, and tweak the scaling settings and other operational settings easily. You can scale down the deployment to zero in the development phase, and scale up if you want to be able to run multiple inference jobs in parallel! you can actively monitor when and how often our model was requested using the monitoring tabs.  If you find  that our model is underperforming due to e.g. data drift, or if you find other ways of improving the model, you can easily initiate more training runs, and store a better model on the default location in Weights & Biases. This way, you  ensure that the best model is always used in production! 
Do you want to try out this workflow for your own training runs? Feel free to sign up at UbiOps and let us know what you think!
﻿
Add a comment
Tags: Articles, MLOps, Intermediate, Tutorial
Published with ❤️ on Weights & Biases. Read more reports in our community, Fully Connected.
Iterate on AI agents and models faster. Try Weights & Biases today.