Skip to main content

A Guide to W&B Inference powered by CoreWeave

A quick primer on how you can use W&B Inference to find the best open-source model for your unique use case
Created on September 5|Last edited on September 11
No single LLM is appropriate for every task. Some models might excel with longer context windows. Others may reason more effectively. Still others may simply prove too expensive for simpler tasks. This diversity is a big reason why most organizations don’t choose a one-size-fits-all LLM for all of their generative AI projects. Instead, companies more frequently experiment with different models to solve different problems.
Handling these distinct models from individual providers introduces complexities, everything from managing API keys to instrumenting code to tracing, evaluating, and improving agents built with different models.
That’s why we launched W&B Inference. W&B Inference provides API and playground access to leading open-source LLMs, including OpenAI’s GPT OSS, Deepseek, Llama 4, and more. Plus, we’re adding new models constantly, so you’ll always have access to test the newest open-source offerings to see if they’re more performant for your unique use case.
This piece is meant to serve as a menu for W&B Inference. We’ll link to each model we offer, a tutorial on how the service works, as well as model-specific tutorials on a few of our newest offerings.

How W&B Inference works

We launched W&B Inference at our Fully Connected Conference in June of 2025 and it’s been growing ever since. We’re adding models constantly, most recently GLM 4.5 and DeepSeek 3.1.
W&B Inference lets you quickly and easily compare multiple open-source models against each other to find the best performer for your use case. We’ll give you a high-level run through in this post but you can get a deeper look in both our launch blog and, of course, our docs.
Getting started is intuitive. Head to https://wandb.ai/inference/ and you’ll see every model we’re currently hosting:

For any given model, you can customize settings and parameters, add functions, add messages as system, assistant, or user prompts, adjust temperature, and more. But what’s especially powerful is comparing these models against each other our Playground view. If you click “add model,” you’ll see a new column where you can select other hosted models (and change any of the settings we mentioned to start this paragraph). Then, add a prompt to compare results:

In addition to the responses, at the bottom of each column you’ll see latency, token use, and price. From there, you can dig into a trace of any model to further interrogate its performance.

Supported models on W&B Inference

As of this moment, we’re hosting 14 open-source models on W&B Inference. Here are their models cards sorted chronologically by when we added them to our service:

Tutorials

We're also building tutorials and quickstarts for each new model. These tutorials cover strengths of each model, benchmarks, and a tutorial complete with code to get you up and running quickly.
Tutorial: Running inference with DeepSeek R1-0528 using W&B Inference
Getting set up and running DeepSeek R1-0528, DeepSeek's advanced long-context language model, in Python using W&B Inference.
Tutorial: Running inference with OpenAI's GPT OSS 20B using W&B Inference
Getting set up and running GPT OSS 20B, OpenAI's advanced language model, in Python using W&B Inference.
Tutorial: Running inference with Llama 3.1 8B using W&B Inference
Getting set up and running Llama 3.1 8B, Meta's advanced long-context language model, in Python using W&B Inference.
Tutorial: Running inference with Qwen3 235B A22B Thinking-2507 using W&B Inference
Getting set up and running Qwen3 235B A22B Thinking-2507, OpenAI's advanced language model, in Python using W&B Inference.
Tutorial: Running inference with Zhipu AI's GLM-4.5 using W&B Inference
Getting set up and running GLM-4.5, Zhipu's advanced long-context language model, in Python using W&B Inference.
Tutorial: Running inference with Llama 3.3 70B using W&B Inference
Getting set up and running Llama 3.3 70B, Meta's advanced language model, in Python using W&B Inference.
Tutorial: Running inference with Llama 4 Scout using W&B Inference
Getting set up and running Llama 4 Scout, Meta's advanced long-context language model, in Python using W&B Inference.
Tutorial: Running inference with DeepSeek V3.1 using W&B Inference
Getting set up and running DeepSeek-V3.1, DeepSeek's advanced long-context language model, in Python using W&B Inference. We'll be working with the DeepSeek model.
Tutorial: Running inference with Kimi K2 using W&B Inference
Getting set up and running Kimi K2, MoonShot AI's advanced long-context language model, in Python using W&B Inference. We'll be working with the moonshotai/Kimi-K2-Instruct model.