Introducing Weave Guardrails
Safeguard your AI applications with pre-built safety and quality scorers
Created on February 19|Last edited on February 20
Comment
Today, we're excited to announce the launch of Weave Guardrails, a new feature that helps developers improve the safety and quality of their AI-powered applications. As organizations rapidly deploy AI systems, ensuring these applications operate safely—and deliver high-quality outputs—has become an important challenge to solve. Weave Guardrails addresses this need with a new, developer-friendly API to detect transgressions alongside a comprehensive suite of pre-built scorer models.

Running coherence and hallucination guardrails against RAG outputs
Programmable safeguards for real-time protection
Weave Guardrails enables developers to implement programmable safeguards. When a Guardrail is triggered, a developer can route the application to execute custom exception code, preventing harmful outputs from reaching end users and protecting both users and brand reputation. All detections are automatically logged within Weave, allowing teams to monitor scorer performance over time and continuously improve application quality.
The example code below shows the result of a LLM call being guarded by the Weave toxicity scorer:
import weavefrom weave.scorers import WeaveToxicityScorerV1toxicity_scorer = WeaveToxicityScorerV1()@weave.opdef call_llm(prompt: str) -> str:"""Generate text using an LLM."""return prompt.upper()async def generate_safe_response(prompt: str) -> str:# Call your AI systemresult, call = call_llm.call(prompt)# Check Toxicity of the output of call_llmsafety = await call.apply_scorer(toxicity_scorer)if not safety.result.passed:return f"I cannot respond, guardrail triggered: {safety.result.metadata}"return result
Evaluating outputs with Weave guardrails
Weave Guardrails also introduces a collection of specialized scorers that evaluate both inputs and outputs of AI applications. These include safety scorers for detecting toxicity, bias, PII exposure, and hallucinations, alongside quality scorers that measure coherence, fluency, and context relevance. For RAG applications specifically, we've developed a trustworthiness scorer, a composite scorer that combines 5 different scorers to provides a clear trust level assessment.
Fine-tuned models available for local deployment
These v1 scorers are small, lightweight language models and can be run locally and at low latency. As part of this launch, we're releasing several fine-tuned models developed by Weights & Biases alongside carefully selected open-source models. For example, our WeaveFluencyScorerV1 model is a fine-tune based on ModernBert-Base from AnswerDotAI, while the WeaveToxicityScorerV1 model uses the Celadon model from PleIAs, a highly performant open source toxicity detection model. We look forward to hearing from users how these models perform and continuing to improve them.
All model weights are publicly available as W&B Artifacts and are automatically downloaded when instantiating a scorer. Each of these scorer's training and evaluations are documented in individual W&B Reports that can be found here.

Evaluating the Celadon toxicity model in Weave
Weave Guardrails is available now for all Weave users. To get started, try our colab or check out our technical documentation. We look forward hearing how Guardrails can help you ship build safer, higher-quality AI applications faster.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.