Automating Model Evaluation With W&B Launch
Trigger automatic model evaluation whenever you push a new version of your model
Created on May 23|Last edited on May 24
Comment
In this report you will learn how to use Weights and Biases Launch to trigger automatic evaluation of your freshly trained model. This is a powerful feature when building ML pipelines, where different teams are building the new models and evaluating the deployed model performance.
Training a model
For this example, we will train a simple model on the FashionMNIST dataset. You can check this article to learn more about this dataset and how to prepare a State-of-the-art classifier. We will use a codebase that lives here, and that consists of a training script train_fmnist.py that performs a simple training of a Neural Network on the FashionMNIST dataset using standard PyTorch.
Linking the model to the Model Registry
The essential bits of code to take into account is the save_model function at the end of the training:
def save_model(model, model_name, models_folder="models", metadata=None, link=False):"""Save the model to wandb as an artifactArgs:model (nn.Module): Model to save.model_name (str): Name of the model.models_folder (str, optional): Folder to save the model. Defaults to "models".metadata (dict, optional): Metadata to save with the model. Defaults to None.link (bool, optional): If True, links the model to the model registry. Defaults to False."""# creates a folder to save the modelmodel_name = f"{wandb.run.id}_{model_name}"file_name = Path(f"{models_folder}/{model_name}.pth")file_name.parent.mkdir(parents=True, exist_ok=True)# save model weights to wandb.Artifactmodel = model.to("cpu")torch.save(model.state_dict(), file_name)at = wandb.Artifact(model_name,type="model",description="Model checkpoint from TIMM",metadata=metadata)at.add_file(file_name)wandb.log_artifact(at)# optionally link the new saved model to the Model Registryif link:wandb.run.link_artifact(at, 'model-registry/FMNIST_Classifier')
This function saves the model as a wandb.Artifact and then links the model to the registry called FMNIST_Classifier. This is crucial, as we will automatically evaluate new models linked later on.
Eval Job
Now that we have a model on the Registry, we can use the eval_fmnist.py script that performs the evaluation. Let's run this script once:
# defaults = SimpleNamespace(# bs=128,# num_workers=0,# device="cuda:0" if torch.cuda.is_available() else "cpu",# model_artifact="capecape/fashion-launch/uvef8vsn_resnest14d:v0",# log_images=False,# )$ python eval_fmnist.py
This will run the script with the defaults arguments and create a job for us in the Jobs tab on the project. We will be able to override all these arguments later when performing evaluation; most importantly, we want to override the model_artifact we are evaluating. To be able to inject new parameters into the evaluation script is mandatory that you write your code in a way that the config is managed by Weights and Biases.
Instrumenting your code with W&B
Any time that you want to re-use a job (remember a job is just a fancy name for a script that is instrumented with W&B that can "launched"), you need to make sure that the code passes the config to wandb and back to your script. This is explained in the documentation in "Making your code job friendly".
This is achieved by using the attribute wandb.config inside your script to pass parameters along:
config = dict(batch_size=128, model_name="resnet18")# creates a run (that will create a job)wandb.init(projec="my_automation_project", config=config)# re assign the config back from the wandb run itselfconfig = wandb.config# do the training/eval/whatevertrain_func(config)
Renaming the eval Job

The job name is linked to the GitHub repo where this code lives. You can rename the job afterward.
A lot happens under the hood when a Job is created; we gather info about the environment of your scripts and store all that info next to your run, so we can seamlessly re-run the eval_fmnist.py script in the same conditions you did it.

Let's rename this... eval_fmnist seems nice!
Launching the job manually
You can manually launch this job by clicking the Launch button and sending this to a running Queue, to do that, you can copy the parameters from a previous run and manually inject any changes you like.
PICTURE/GIF or something
Creating an Automation
Let's create an automation to run the evaluation job when a new model is linked to the registered model:
In the Model Registry, select the model you want to run and click on the three dots and select + New automation;

A new dialoghe box is presented, and select the type of automation you want, in our case:

Select the Job you want to execute; in our case, we renamed the job eval_fmnist

Add a comment