Weave

Deliver AI applications with confidence

W&B Weave helps developers evaluate, monitor, and iterate on their AI applications to continuously improve quality, latency, cost, and safety. Run robust evaluations, keep pace with new LLMs, debug your applications easily, and monitor production performance—all while collaborating securely.

W&B Weave is framework and LLM agnostic with a wide range of pre-built integrations

Evaluations

Experiment with LLMs, prompts, RAG, agents, and guardrails using rigorous evaluations to optimize your AI application’s performance across multiple dimensions—quality, latency, cost, and safety. Weave offers powerful visualizations, automatic versioning, leaderboards, and a playground to precisely measure and rapidly iterate on improvements. Centrally track all evaluation data to enable reproducibility, lineage tracking, and collaboration.

Learn more

Production monitoring and debugging

Weave automatically logs all inputs, outputs, code, and metadata in your application and organizes the data into a trace tree that you can easily navigate and analyze to debug issues. Use real-time traces to monitor your app in production and improve performance continuously. Score live incoming production traces with online evals for monitoring without impacting your app’s performance (sign up for online evals preview). Develop multimodal apps—Weave logs text, documents, code, HTML, chat threads, images, and audio, with support for video and other modalities coming soon.

Learn more

Start with our scorers

Weave provides pre-built LLM-based scorers for common tasks

Or bring your own

Plug in off-the-shelf third party scoring solutions into Weave or write your own

Scoring

Weave automatically tracks quality scores, latency, and cost metrics for every trace. Weave offers built-in scorers for common metrics like hallucination, moderation, and context relevancy. Customize them or build your own from scratch. Scorers can use any LLM as a judge to generate the metrics.

Human feedback

Collect human feedback from users and experts for real-life testing and evaluation. Feedback can be simple thumbs-up/down ratings and emojis or detailed qualitative annotations. Use our annotation template builder to tailor the labeling interface for consistency while improving efficiency and quality.

❌ Toxicity
❌ Bias
❌ Hallucination

And more …

Guardrails (preview in Q1 2025)

Protect your brand and end users by implementing guardrails using Weave. Our out-of-box filters detect harmful outputs and prompt attacks. Once an issue is detected, pre- and post-hooks trigger safeguards to steer the response in line with your company guidelines and policies.

Sign up for preview

Get started with one line of code

Developers love Weave because it’s so easy to get started – all you need is one line of code, and your GenAI application inputs, outputs, and code are automatically tracked and organized for rigorous evaluation, monitoring, and iteration. We offer SDKs for Python, JavaScript, and TypeScript. For other languages, you can use our REST API.

The Weights & Biases platform helps you streamline your workflow from end to end

Models

Experiments

Track and visualize your ML experiments

Sweeps

Optimize your hyperparameters

Registry

Publish and share your ML models and datasets

Automations

Trigger workflows automatically

Weave

Traces

Explore and debug LLMS

Evaluations

Rigorous evaluations of GenAI applications

Core

Artifacts

Version and manage your ML pipelines

Tables

Visualize and explore your ML data

Reports

Document and share your ML insights

SDK

Log ML experiments and artifacts at scale

The Weights & Biases platform helps you streamline your workflow from end to end

Models

Experiments

Track and visualize your ML experiments

Sweeps

Optimize your hyperparameters

Registry

Publish and share your ML models and datasets

Automations

Trigger workflows automatically

Weave

Traces

Explore and
debug LLMs

Evaluations

Rigorous evaluations of GenAI applications

Core

Artifacts

Version and manage your ML pipelines

Tables

Visualize and explore your ML data

Reports

Document and share your ML insights

SDK

Log ML experiments and artifacts at scale