Available models

W&B Inference powered by CoreWeave provides API and playground access to leading open-source LLMs, including OpenAI GPT OSS, Qwen3, Kimi K2, Llama 4, DeepSeek, and Phi, allowing Weights & Biases users to develop AI applications and agents without needing to sign up for a hosting provider or host models themselves

IBM

IBM Granite 4.1 8B

New

Apr 2026
$.05 input

 / 

$.10 output
131K
Granite 4.1 8B is a long-context instruct model capable of enhanced tool calling, instruction following, and chat capabi

Z.AI GLM 5.1

Text

New

Apr 2026
$1.40 input

 / 

$0.26 cached

 / 

$4.40 output
203K
Powerful MoE model for long-horizon agentic engineering and advanced reasoning.

Google Gemma 4 31B

Text

Vision

New

Apr 2026
$0.30 input

 / 

$1.25 output
262K
Gemma 4 31B Dense is designed for advanced reasoning, agentic workflows, and longer context and is natively trained on 1

NVIDIA Nemotron 3 Super 120B

Text

Mar 2026
$0.20 input

 / 

$0.80 output
262K
Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.

Qwen3.5 35B A3B

Text

Vision

Feb 2026
$0.25 input

 / 

$1.25 output
262K
Qwen3.5-35B-A3B is an open-weights multimodal MoE model built for efficient, high-throughput inference across chat, reasoning, and agentic tasks.

MiniMax M2.5

Text

Feb 2026
$0.30 input

 / 

$1.20 output
197K
MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities

Z.AI GLM 5

Text

Deprecated

Feb 2026
$1.00 input

 / 

$3.20 output
203K
Mixture-of-Experts model for long-horizon agentic tasks with strong performance on reasoning and coding.

Moonshot AI Kimi K2.5

Text

Vision

Jan 2026
$0.60 input

 / 

$0.10 cached

 / 

$3.00 output
262K
Multimodal MoE language model featuring 32 billion activated parameters and a total of 1 trillion parameters

Deepseek V3.1

Text

Aug 2025
$0.55 input

 / 

$1.65 output
128K
A large hybrid model that supports both thinking and non-thinking modes via prompt templates.

OpenAI GPT OSS 20B

Text

Aug 2025
$0.05 input

 / 

$0.20 output
131K
Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.

OpenAI GPT OSS 120B

Text

Aug 2025
$0.15 input

 / 

$0.60 output
131K
Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.

Qwen3 30B A3B

Text

Jul 2025
$0.10 input

 / 

$0.30 output
262K
Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.

Qwen3 235B A22B-2507

Text

Jul 2025
$0.10 input

 / 

$0.10 output
262K
Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.

Qwen3 Coder 480B A35B

Text

Jul 2025
$1.00 input

 / 

$1.50 output
262K
Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.

Qwen3 235B A22B Thinking-2507

Text

Jul 2025
$0.10 input

 / 

$0.10 output
262K
High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.

OpenPipe Qwen3 14B Instruct

Text

Apr 2025
$0.05 input

 / 

$0.22 output
33K
An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.

Meta Llama 4 Scout

Text

Vision

Apr 2025
$0.17 input

 / 

$0.66 output
64K
Multimodal model integrating text and image understanding, ideal for visual tasks and combined analysis.

Microsoft Phi 4 Mini 3.8B

Text

Feb 2025
$0.08 input

 / 

$0.35 output
128K
Compact, efficient model ideal for fast responses in resource-constrained environments.

Meta Llama 3.3 70B

Text

Dec 2024
$0.71 input

 / 

$0.71 output
128K
Multilingual model excelling in conversational tasks, detailed instruction-following, and coding.

Meta Llama 3.1 70B

Text

Jul 2024
$0.80 input

 / 

$0.80 output
128K
Efficient conversational model optimized for responsive multilingual chatbot interactions.

Meta Llama 3.1 8B

Text

Jul 2024
$0.22 input

 / 

$0.22 output
128K
Efficient conversational model optimized for responsive multilingual chatbot interactions.