Skip to main content

Weave newsletter: LLM evals charts, AI ads, Groq and Cohere integrations

A tutorial on eval charts, using Weave to make a Weave ad, and new integrations
Created on September 12|Last edited on September 12
Hey Builders - Happy Llama 3.1 405B release day. We are so happy for the team at Meta and extremely proud they’re a Weights & Biases customer. We can’t wait to see what you will build with Llama 3.1 405B.
At Weights & Biases, we’ve been hard at work making W&B Weave—our lightweight toolkit for tracking and evaluating LLM applications—even better. From new integrations to tutorials to community projects, we wanted to share what’s new with Weave and how you can take advantage of the latest improvements.

LLM tip of the week

System prompt compression: Asking an LLM to reduce the length of your system prompt is a simple but effective way to save on token usage. We’ve used it for our user support bot this week and reduced token usage by ~500 tokens with no drop in accuracy.
mean prompt_tokens: originalmean prompt_tokens: compressed
8182.31637616.1531


New integrations

The Weave integrations team never rests. Now, Weave automatically logs calls made by both popular LLMs and orchestration frameworks including Groq, Cohere, DSPy, Google Gemini, Together AI, OpenRouter, and LangChain. To get automatic tracing and token usage, just call weave.init("my_LLM_project"). See the Weave integrations docs for the full list.

Product updates

Here’s a new video from our lead Weave engineer Tim showing off a new feature he built. It’s an intuitive, visual way that lets you compare evaluations and drill down into individual examples. Like our recent launch of Feedback, these comparisons will help your team improve their LLM evaluation workflows and make more informed decisions.


Building an AI-powered teacher’s assistant

A walkthrough of building an AI assistant and using LLMs to evaluate, powered by Groq and LlamaIndex.

A cookbook to detect factual inconsistencies

Based on Eugene Yan’s past work, this cookbooks shows you how to evaluate a baseline on the Factual Inconsistencies benchmark, fine-tune, and evaluate again.

Using Weave to make a Weave ad

We created a Weave ad (on YouTube) using SDXL, AnimateDiff and interpolation techniques.


Community

Eris v0.1 is a novel LLM evaluation framework using debate simulations built using W&B Weave Evaluations. Eris simulates full debate flows, including constructive speeches, cross-examinations, rebuttals, and closing arguments.

Need help getting started with W&B Weave?

Iterate on AI agents and models faster. Try Weights & Biases today.