Intuitive visualizations allow you to interrogate every step of your LLM program. Easily review past results, identify and debug errors, gather insights about model behavior, and share learnings with colleagues.

Understand and debug your LLM chains

Our new LLM debugging tool, with Trace Timeline and Trace Table, is a natural extension of our scalable experiment tracking, designed to support ML practitioners working on prompt engineering for LLMs. Visualize and drill down into every component and activity throughout the trace of your program.

Drill into your model architecture

Practitioners doing prompt engineering need to understand how chain components are set up as part of their iteration and analysis. Model Architecture provides a detailed description of all the settings, tools, agents and prompt details within the topology of a particular chain.

Run OpenAI evaluations with W&B Launch

Use W&B Launch to easily run any evaluation from OpenAI Evals – a fast-growing repository of dozens of evaluation suites for LLM evaluation – with just one click. Launch packages up everything you need to run the evaluation report, logs the evaluation in W&B Tables for easy visualization and analysis, and generates a Report for seamless collaboration. Use the one-line OpenAI integration to log OpenAI model inputs and outputs.

Visualize and analyze text data with W&B Tables

To better support prompt engineering practitioners working with text data, we’ve made several improvements to how we display text in Tables. Users can now visualize Markdown, as well as display the diff between 2 strings, to better understand the impact of changes to their LLM prompts. Long-text fields also now include tooltips and string previews.

Adam McCabe
Head of Data
“The challenge with GCP is you’re trying to parse terminal output. What I really like about Prompts is that when I get an error, I can see which step in the chain broke and why. Trying to get this out [otherwise] is such a pain.”
Peter Welinder
VP of Product- OpenAI
“We use W&B for pretty much all of our model training.”

Ellie Evans
VP of Product- OpenAI

“W&B lets us examine all of our candidate models at once. This is vital for understanding which model will work best for each customer. Reports have [also] been great for us. They allow us to seamlessly communicate nuanced technical information in a way that’s digestible for non-technical teams.”

