Enhancing AI safety, quality, and efficiency: The latest in the Weights & Biases and NVIDIA collaboration
A recap of all our improvements to the Weights & Biases platform, our growing partnership with NVIDIA, and an invitation to come say hi at booth 1336 at this year's GTC
Created on March 18|Last edited on March 18
Comment
In 2024, and continuing into 2025, we see a focus on creating production AI applications, and the need for infrastructure and model agnostic tooling and packages to help enterprises achieve these goals.
To meet these needs, Weights & Biases offers an AI platform so developers can evaluate, monitor, and iterate on AI applications. Weights & Biases offers tools to track everything—inputs, outputs, metrics, prompts, code, and training experiments. Without proper tracking, you risk losing valuable IP from your experiments, making it hard—if not impossible—to reproduce results which can force you to start from scratch rather than focusing on improving and optimizing your application. The W&B platform with NVIDIA supports the full AI development workflow with W&B Weave to help developers build and monitor AI applications and W&B Models to fine-tune AI models.
New innovations in W&B Weave
Based on input and feedback from enterprises of all sizes and industries, Weights & Biases has been quickly adding new capabilities to help organizations accelerate their AI applications to production.
Most recently, we have introduced W&B Guardrails. Complete with a developer-friendly API, Guardrails is a set of pre-built scorers for safety and quality to support responsible AI. Safety scorers include toxicity, bias, PII detection, and hallucinations, while quality scores include coherence, fluency and context relevance. In addition, W&B Weave provides a flexible framework, allowing for enterprises to bring their own custom Guardrails or leverage other third party scorers within our platform.
Next, no matter how thoroughly you test, real-world usage often uncovers unexpected scenarios and edge cases. Sometimes, a user request in production turns out to be a valuable example for future evaluations. With W&B Weave, you can easily build new datasets by adding selected traces, helping you build more effective evaluations more quickly. This feature is available using both the Weave UI and the SDK.
Additionally, when you are evaluating AI applications and need to quickly test and improve, the new Weave Playground now offers a trial feature. You can generate multiple outputs for the same prompt to assess the robustness of responses. Simply increase the “Number of Trials” in the Playground settings sidebar and run your prompt. Then, review the outputs to identify inconsistencies or outliers. You can use this insight to fine-tune LLM settings such as temperature and implement guardrails against prompt attacks and harmful responses.
Another common strategy to improve your AI applications is incorporating expert feedback. In W&B Weave, you can now create a custom UI for labelers and domain experts to generate consistent, high-quality annotations that can be reused across projects.
Lastly, for enterprises leveraging sensitive data with their AI applications, the W&B SDK offers Sensitive Data Protection. The capability automatically redacts personally identifiable information (PII) from a trace before it is sent to Weave servers, allowing enterprises with the most sensitive data and strict compliance requirements to leverage the observability and capabilities of W&B Weave for their AI use cases.
Expanding support for NVIDIA ecosystem
In addition to these new W&B Weave capabilities, Weights & Biases is closely collaborating with NVIDIA to provide a strong combination for enterprises building AI applications and building AI models. In particular, Weights & Biases is expanding support for both NVIDIA DGX supercomputing and the NVIDIA AI Enterprise software platform.
Weights & Biases was proud to be a launch partner for NVIDIA AI Blueprints this January. Specifically, we collaborated with the NVIDIA team to create a version of the AI Virtual Assistant NVIDIA AI Blueprint, enhancing it with observability provided by W&B Weave. You can read our getting started guide here.
At GTC, we're showcasing a preview of W&B Weave’s integration with NVIDIA NeMo Evaluator microservice (currently in early access). The NeMo Evaluator microservice provides a set of enterprise-grade scoring metrics and LLM-as-a-judge-capabilities for enterprises creating an AI center of excellence to accelerate the development of GenAI applications. The integration between W&B Weave and NeMo Evaluator allows developers to easily log, analyze, and compare the performance of the foundation models powering their applications and to quickly iterate and improve the quality of the responses of the applications.
For organizations training & fine-tuning AI models, we are also announcing an integration with NVIDIA DGX Cloud, a unified AI platform to optimize performance with software, services, and AI expertise for evolving workloads. The integration allows AI researchers training and fine-tuning models in DGX Cloud to easily log their experiment parameters and metrics to W&B Models, without needing to instrument their training script with the W&B API calls.
Weights & Biases at GTC
And of course, we will be out in full force at GTC. If you’d like to see what we’ve been building and any of the integrations mentioned above, you can request a meeting with the team or come chat with us at booth 1336. In addition, our co-founder and CEO, Lukas Biewald, will be giving a talk around the challenges and best practices for creating production GenAI applications on Thursday, March 20th, at 3PM.
Looking forward to seeing you there.
Add a comment
Tags: Articles
Iterate on AI agents and models faster. Try Weights & Biases today.