A Guide to W&B Inference powered by CoreWeave

A quick primer on how you can use W&B Inference to find the best open-source model for your unique use case
Created on September 5|Last edited on September 26
Comment
No single LLM is appropriate for every task. Some models might excel with longer context windows. Others may reason more effectively. Still others may simply prove too expensive for simpler tasks. This diversity is a big reason why most organizations don’t choose a one-size-fits-all LLM for all of their generative AI projects. Instead, companies more frequently experiment with different models to solve different problems. 
Handling these distinct models from individual providers introduces complexities, everything from managing API keys to instrumenting code to tracing, evaluating, and improving agents built with different models. 
That’s why we launched W&B Inference. W&B Inference provides API and playground access to leading open-source LLMs, including OpenAI’s GPT OSS, Deepseek, Llama 4, and more. Plus, we’re adding new models constantly, so you’ll always have access to test the newest open-source offerings to see if they’re more performant for your unique use case. 
This piece is meant to serve as a menu for W&B Inference. We’ll link to each model we offer, a tutorial on how the service works, as well as model-specific tutorials on a few of our newest offerings. Additionally if you'd like to read our "Executive guide to AI inference," just click the button below. 
Download the free whitepaper﻿
How W&B Inference worksWe launched W&B Inference at our Fully Connected Conference in June of 2025 and it’s been growing ever since. We’re adding models constantly, most recently GLM 4.5 and DeepSeek 3.1. 
W&B Inference lets you quickly and easily compare multiple open-source models against each other to find the best performer for your use case. We’ll give you a high-level run through in this post but you can get a deeper look in both our launch blog and, of course, our docs. 
Getting started is intuitive. Head to https://wandb.ai/inference/ and you’ll see every model we’re currently hosting: 
﻿
For any given model, you can customize settings and parameters, add functions, add messages as system, assistant, or user prompts, adjust temperature, and more. But what’s especially powerful is comparing these models against each other our Playground view. If you click “add model,” you’ll see a new column where you can select other hosted models (and change any of the settings we mentioned to start this paragraph). Then, add a prompt to compare results:
﻿
In addition to the responses, at the bottom of each column you’ll see latency, token use, and price. From there, you can dig into a trace of any model to further interrogate its performance. 
Supported models on W&B InferenceAs of this moment, we’re hosting 14 open-source models on W&B Inference. Here are their models cards sorted chronologically by when we added them to our service: 
﻿Z.AI GLM 4.5﻿
﻿DeepSeek V3.1﻿
﻿OpenAI GPT OSS 120B﻿
﻿OpenAI GPT OSS 20B﻿
﻿Qwen3 235B A22B Thinking-2507﻿
﻿Qwen3 235B A22B-2507﻿
﻿Qwen3 Coder 480B A35B﻿
﻿MoonshotAI Kimi K2﻿
﻿DeepSeek R1-0528﻿
﻿Meta Llama 4 Scout﻿
﻿DeepSeek V3-0324﻿
﻿Microsoft Phi 4 Mini 3.8B﻿
﻿Meta Llama 3.3 70B﻿
﻿Meta Llama 3.1 8B﻿
TutorialsWe're also building tutorials and quickstarts for each new model. These tutorials cover strengths of each model, benchmarks, and a tutorial complete with code to get you up and running quickly. 
Tutorial: Running inference with DeepSeek R1-0528 using W&B Inference
Getting set up and running DeepSeek R1-0528, DeepSeek's advanced long-context language model, in Python using W&B Inference. 
Tutorial: Running inference with OpenAI's GPT OSS 20B using W&B Inference
Getting set up and running GPT OSS 20B, OpenAI's advanced language model, in Python using W&B Inference. 
Tutorial: Running inference with Llama 3.1 8B using W&B Inference
Getting set up and running Llama 3.1 8B, Meta's advanced long-context language model, in Python using W&B Inference. 
Tutorial: Running inference with Qwen3 235B A22B Thinking-2507 using W&B Inference
Getting set up and running Qwen3 235B A22B Thinking-2507, OpenAI's advanced language model, in Python using W&B Inference. 
Tutorial: Running inference with Zhipu AI's GLM-4.5 using W&B Inference
Getting set up and running GLM-4.5, Zhipu's advanced long-context language model, in Python using W&B Inference. 
Tutorial: Running inference with Llama 3.3 70B using W&B Inference
Getting set up and running Llama 3.3 70B, Meta's advanced language model, in Python using W&B Inference. 
Tutorial: Running inference with Llama 4 Scout using W&B Inference
Getting set up and running Llama 4 Scout, Meta's advanced long-context language model, in Python using W&B Inference. 
Tutorial: Running inference with DeepSeek V3.1 using W&B Inference
Getting set up and running DeepSeek-V3.1, DeepSeek's advanced long-context language model, in Python using W&B Inference. We'll be working with the DeepSeek model.
Tutorial: Running inference with Kimi K2 using W&B Inference
Getting set up and running Kimi K2, MoonShot AI's advanced long-context language model, in Python using W&B Inference. We'll be working with the moonshotai/Kimi-K2-Instruct model.
﻿
﻿
﻿
Add a comment
Tags: Articles, Inference
Iterate on AI agents and models faster. Try Weights & Biases today.