Introducing Weave Guardrails

Safeguard your AI applications with pre-built safety and quality scorers
Created on February 19|Last edited on February 20
Comment
Today, we're excited to announce the launch of Weave Guardrails, a new feature that helps developers improve the safety and quality of their AI-powered applications. As organizations rapidly deploy AI systems, ensuring these applications operate safely—and deliver high-quality outputs—has become an important challenge to solve. Weave Guardrails addresses this need with a new, developer-friendly API to detect transgressions alongside a comprehensive suite of pre-built scorer models.
Running coherence and hallucination guardrails against RAG outputs
Programmable safeguards for real-time protectionWeave Guardrails enables developers to implement programmable safeguards. When a Guardrail is triggered, a developer can route the application to execute custom exception code, preventing harmful outputs from reaching end users and protecting both users and brand reputation. All detections  are automatically logged within Weave, allowing teams to monitor scorer performance over time and continuously improve application quality.
The example code below shows the result of a LLM call being guarded by the Weave toxicity scorer:
import weave
from weave.scorers import WeaveToxicityScorerV1
﻿
toxicity_scorer = WeaveToxicityScorerV1()
﻿
@weave.op
def call_llm(prompt: str) -> str:
    """Generate text using an LLM."""
    return prompt.upper()
﻿
async def generate_safe_response(prompt: str) -> str:
    # Call your AI system
    result, call = call_llm.call(prompt)
﻿
    # Check Toxicity of the output of call_llm
    safety = await call.apply_scorer(toxicity_scorer)
    if not safety.result.passed:
        return f"I cannot respond, guardrail triggered: {safety.result.metadata}"
﻿
    return result
Evaluating outputs with Weave guardrailsWeave Guardrails also introduces a collection of specialized scorers that evaluate both inputs and outputs of AI applications. These include safety scorers for detecting toxicity, bias, PII exposure, and hallucinations, alongside quality scorers that measure coherence, fluency, and context relevance. For RAG applications specifically, we've developed a trustworthiness scorer, a composite scorer that combines 5 different scorers to provides a clear trust level assessment.
Fine-tuned models available for local deploymentThese v1 scorers are small, lightweight language models and can be run locally and at low latency. As part of this launch, we're releasing several fine-tuned models developed by Weights & Biases alongside carefully selected open-source models. For example, our WeaveFluencyScorerV1 model is a fine-tune based on ModernBert-Base from AnswerDotAI, while the WeaveToxicityScorerV1 model uses the Celadon model from PleIAs, a highly performant open source toxicity detection model. We look forward to hearing from users how these models perform and continuing to improve them.
All model weights are publicly available as W&B Artifacts and are automatically downloaded when instantiating a scorer. Each of these scorer's training and evaluations are documented in individual W&B Reports that can be found here.
Evaluating the Celadon toxicity model in Weave
﻿
Weave Guardrails is available now for all Weave users. To get started, try our colab or check out our technical documentation. We look forward hearing how Guardrails can help you ship build safer, higher-quality AI applications faster.
﻿
Add a comment
Tags: Articles, Weave
Iterate on AI agents and models faster. Try Weights & Biases today.