Introducing run metrics notifications for W&B Models
We're making it easier to keep track of your model metrics as you train. Here's what you need to know
Created on April 2|Last edited on April 4
Comment

Model training and fine-tuning requires constant attention, but no one can monitor dashboards around the clock. That's why W&B Models has added customizable alerts for tracking your most important metrics. Now, even when you're away from your workspace, you can receive instant notifications via Slack when important metrics increase or decrease by a certain percentage during a run or when critical thresholds are reached. Our interactive dashboards helps you visualize training data and run metrics notifications, ensuring you never miss what matters, whether your run lasts minutes, hours, days, or weeks.
Alerting functionality is not new for W&B Models, but the new run metrics notifications feature allows a level of nuance not previously available. Built using W&B Automations, these notifications are triggered when certain user-specified conditions—either absolute or based on comparisons with previous performance metrics—are met.
Run metrics notifications
The enhanced sophistication and flexibility of run metrics notifications over previous alerting functionality starts with notifying only the right people, ensuring that every user does not receive every notification. Large teams running numerous training and fine-tuning projects can stay selectively informed. Run metrics notifications allow filtering runs by name through regular expressions and routing alerts to designated Slack channels or specific team members. This targeted approach means machine learning engineers and data scientists only receive updates about the metrics and conditions that matter.
Run metrics notifications can be automatically triggered when one of the following events occurs:
A run metrics change threshold is met
This condition allows a user to receive a notification if there is a significant change in a metric’s trend during a training or fine-tuning run. This might alert a user or team that the model is not converging or loss values have plateaued, and it may make more sense to stop a run rather than waste precious resources.

A run metrics absolute threshold is met
This condition allows a user to receive a notification if the average value for a specified metric across a specified window rises above or falls below a specified threshold during a training or fine-tuning run. Examples of when this might be useful include monitoring accuracy or loss over time to determine whether it might be worth ending a run early or checking the GPU utilization or temperature over time to ensure the configuration can adequately handle the hardware demands of the run.

Additionally, users and teams have two options for the type of action that’s kicked off when a run metrics condition is met:
Slack notification
When the specified run metrics threshold has been reached, a Slack notification including the name and description of the Automation will automatically be sent to the proper channel or individual. Clicking on the notification takes you directly to the originating project so analysis can begin immediately.

Webhooks
Whether kicking off an automated workflow using Github Actions, sending out email or text message alerts, or interacting with a third-party API, webhooks allow you a high degree of freedom when reacting to metric-driven events.
As with other W&B Automations, you can define a JSON payload that is sent to a webhook, including dynamic values for keys directly related to Run metric notifications. Webhooks are frequently used along with Automations to integrate W&B Models with CI/CD pipelines. Adding an Artifact to W&B Registry, applying an alias to an artifact, and, now, creating run metric notifications during experiment tracking, can initiate or tie directly into CI/CD processes to create more seamless and streamlined AI and machine learning continuous integration and continuous deployment workflows.

Run metrics notifications are built using the Automations engine offering team members visibility into existing notifications. Rather than burying alerting calls in individual code bases, these automations live directly in your W&B Models project. This means that they are accessible from a single, easy-to-find location where they can be viewed, created, and edited by authorized team members.
AI training runs demand time, but they shouldn't demand constant attention. Our new W&B Models run metrics notifications can free you up from staring non-stop at your monitor, sending alerts only when problems arise. The result? You stay informed about critical issues while reclaiming your time for other projects, dinner with friends and family, or just a good night’s sleep.
Create your first run metrics notification
Run metrics notifications are available for our Pro and Enterprise edition SaaS users. You can learn more about run metrics notifications and Automations in the W&B Models product docs.
Team and User Alerts
Now that we’ve discussed run metrics notifications delivering freedom from manual monitoring of your AI and machine learning workloads, let’s quickly review the features already available in W&B Models.

Alerting can be configured on either the team settings page and the user settings page for teams and users respectively. The pre-built alerts, which can only be sent via email and Slack, include:
- Run finished: When any run executed by the user or team completes, an alert is sent out to specified email address(es) and/or Slack channels
- Run crashed after x minutes/hours/days/weeks: When any run executed by the user or team fails after a specified length of time, an alert is sent out to specified email address(es) and/or Slack channels.
Users can also trigger an alert at any point during a run when using the SDK. There are a number of situations when a user might want to call wandb.alert(), including if the gradient of your training loop starts to blow up (reports NaN) or a step in your training pipeline completes.
import wandbfrom wandb import AlertLevelrun = wandb.init()if acc < threshold:run.alert(title="Low accuracy",text=f"Accuracy {acc} is below the acceptable threshold {threshold}",level=AlertLevel.WARN,wait_duration=300,)
While wandb.alert() allows you to define any custom conditions to trigger a notification, it is limited to just email and Slack messages and it must be maintained and supported in Python code. Additionally, while defining logic to trigger notifications based on absolute and comparative thresholds is certainly possible, this scenario is now more easily addressed using the new run metrics notifications.
You can learn more about how to implement wandb.alert() in your code and all of its capabilities in the Models product docs.
Add a comment
Tags: Articles, W&B Features
Iterate on AI agents and models faster. Try Weights & Biases today.