Building an MLOps Pipeline Using W&B and UbiOps
In this post we show how to connect UbiOps and Weights and Biases for training and deploying a machine learning model.
Created on June 6|Last edited on June 20
Comment
Introduction
We're going to use the UbiOps platform to train three models on the CIFAR-10 dataset. We'll be training them in parallel, in the cloud, using NVIDIA T4 GPUs.
While the training jobs are running, we'll head over to Weights & Biases to analyze performance metrics during our training runs, and to compare the final models. After checking the accuracy metrics of all three training runs, we will store our best performing ML model on Weights & Biases, and deploy it by turning it into a live and scalable API endpoint on UbiOps. The model can be conveniently exposed to customers via the API endpoint in a production set up, allowing it to scale, depending on the demand.
Lastly, you can follow along by creating a free tier account via this link, and by navigating to this Google Colab!
A Brief Intro to UbiOps
UbiOps is an MLOps platform that helps you deploy machine learning models and pipelines as scalable inference endpoints. Also, it provides easy access to scalable, on-demand GPU compute.
Working with UbiOps and Weights & Biases
UbiOps will take your Python code, create a Docker with all necessary dependencies and run the job as a microservice in the cloud on Kubernetes. UbiOps can both do model inference where it creates an auto-scaling inference API endpoint, as well as run model training jobs, where the same code environment can be reused to run multiple training job in parallel, on selected compute resources.
In our training script, we'll load the CIFAR10 dataset, apply some preprocessing and connect to our WandB project. The training job is wrapped in a Weights & Biases run so that we can actively monitor it. We can also monitor the usage of the GPU along the way, download model checkpoints created during the Weights & Biases callbacks, and analyze the performance of our models during runtime!
So, to run our training scripts, we'll first need to set-up the code environment. This environment includes the Python version we're working with, dependencies that we use in our training code, and pre-installed CUDA drivers. After setting up this environment once, we can reuse it to run different (training jobs) with an identical runtime.

The UbiOps interface of an environment. Environments contain dependencies and files that are required for your scripts to run. The environment can be re-used for different training jobs and deployments. Here, the environment only contains some pip packages.
We'll then create a UbiOps experiment—which segments different training runs—and we select our compute resources to include an NVIDIA T4 GPU card. When we upload a training job, the training code will be run on top of our environment on the selected compute resource. Within this experiment, we can easily try out different training codes, or run the same training code with different hyperparameters. In this example, we will do the latter.
Launching our training jobs on UbiOps
Within our experiment, we will run three training jobs with different batch sizes, optimizers, and random rotations of our images. We can initiate the runs via the UI, or via a few lines of code:
data_experiments = [{"batch_size": 32,"nr_epochs": 50,"optimizer": "adam","random_rotation": 0.1},{"batch_size": 64,"nr_epochs": 50,"optimizer": "adam","random_rotation": 0.25},{"batch_size": 16,"nr_epochs": 50,"optimizer": "sgd","random_rotation": 0.25}]for index, data_experiment in enumerate(data_experiments):new_run = training_instance.experiment_runs_create(project_name=PROJECT_NAME,experiment_name=EXPERIMENT_NAME,data=ubiops.ExperimentRunCreate(name=f"training-run-{index}",description=f'Trying out a run with {data_experiment["nr_epochs"]} epochs and batch size {data_experiment["batch_size"]}',training_code='training_code/train.py',parameters=data_experiment))
We can check the statuses and logs of our training runs in the UI of UbiOps:

From the experiment interface on UbiOps, we see that two training jobs are running, while a third is still pending. This indicates that the on-demand GPU is still booting up.
Using Weights and Biases, we get a more detailed understanding of the performance of the different runs, during and after training. Note that the GPUs are being used at full capacity!

From the experiment interface on UbiOps, we can track the metrics of our training runs while they are executing. Additionally, we get insights in metrics that are related to the usage of the GPUs!
Using Weights & Biases to store our best model
Using this information, we can select the model with the highest final validation accuracy. We use the Weights and Biases API to download the best model from the best run, and to upload this model to a production project in Weights and Biases, which we called ‘production-environment’ here
import wandb#Download the best model from our best runbest_training_run = " "tmp_folder = 'artifact_folder'# Specifying key with api_key argument gives errors sometimeswandb_api = wandb.Api()print(f"{WANDB_TRAINING_PROJECT}/model-{best_training_run}:latest")artifact_obj= wandb_api.artifact(f"{WANDB_TRAINING_PROJECT}/model-{best_training_run}:latest", type = 'model')artifact_obj.download(tmp_folder)#And upload the artifact to our production environmentwith wandb.init(project = "production-environment", job_type = "model") as run:artifact = wandb.Artifact('production-models', type = 'model')artifact.add_dir(f"{tmp_folder}")run.log_artifact(artifact)
Deploying our Model on UbiOps
Next we are going to deploy the model and create an inference endpoint on UbiOps . This is called a 'deployment' in UbiOps and contains the following Python code. The Python code is again executed in an environment with the proper dependencies loaded. We use the initialization function of our deployment to grab our latest model from the production environment and to load it in memory. The request function is used to classify a new input image. The final deployment code looks as follows:
import numpy as npimport tensorflow as tfimport wandbfrom imageio.v3 import imreadclass Deployment:def __init__(self):print("Initialising deployment")# make connection to wand API here and pass reference to load_modelwandb_api = wandb.Api()artifact_obj = wandb_api.artifact('production-environment/production-models:latest')artifact_path = 'artifact_folder'artifact_obj.download(artifact_path)self.model = tf.keras.models.load_model(artifact_path)self.cifar_classes = ("Airplane", "Automobile", "Bird", "Cat", "Deer", "Dog", "Frog", "Horse", "Ship", "Truck")def request(self, data):print("Processing request")x = imread(data['image'])# Check that the image is 32x32 pixelsassert x.shape == (32, 32, 3)# convert to a 4D tensor to feed into our modelx = x.reshape(1, 32, 32, 3)x = x.astype(np.float32) / 255out = self.model.predict(x)prediction = self.cifar_classes[int(np.argmax(out))]# here we set our output parameters in the form of a jsonreturn {'prediction': prediction}
We can upload our deployment files to UbiOps, and specify some scaling and request retention settings along the way. We decide that we run on a CPU instance with 1 GB RAM, and set maximum_instances = 3, so that we can handle peak workloads. Also, we store request in and output, so that we can measure the performance of our model in production.
After uploading our deployment package to UbiOps, we can send requests to the model API. This (blurry) image of a frog is sent to the inference deployment. By looking at the image showing the results, we can see that this image is classified correctly!/

Conclusion
And that's it! We've used the training insights from Weights & Biases, and the compute resources and deployment possibilities from UbiOps to create a training and inferencing pipeline, resulting in a live and scalable model. We can reach our model via its API endpoint, when we provide the correct authentication credentials.
After setting up the baseline model, you can easily add new deployment versions, and tweak the scaling settings and other operational settings easily. You can scale down the deployment to zero in the development phase, and scale up if you want to be able to run multiple inference jobs in parallel! you can actively monitor when and how often our model was requested using the monitoring tabs. If you find that our model is underperforming due to e.g. data drift, or if you find other ways of improving the model, you can easily initiate more training runs, and store a better model on the default location in Weights & Biases. This way, you ensure that the best model is always used in production!
Do you want to try out this workflow for your own training runs? Feel free to sign up at UbiOps and let us know what you think!
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.