Available models
W&B Inference powered by CoreWeave provides API and playground access to leading open-source LLMs, including OpenAI GPT OSS, Qwen3, Kimi K2, Llama 4, DeepSeek, and Phi, allowing Weights & Biases users to develop AI applications and agents without needing to sign up for a hosting provider or host models themselves
NVIDIA Nemotron 3 Super 120B
Text
New
Mar 2026
$0.20 input
/
$0.80 output
262K
Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.
MiniMax M2.5
Text
New
Feb 2026
$0.30 input
/
$1.20 output
197K
MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities
Z.AI GLM 5
Text
New
Feb 2026
$1.00 input
/
$3.20 output
203K
Mixture-of-Experts model for long-horizon agentic tasks with strong performance on reasoning and coding.
Moonshot AI Kimi K2.5
Text
Vision
Jan 2026
$0.50 input
/
$2.85 output
262K
Multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters.
Deepseek V3.1
Text
Aug 2025
$0.55 input
/
$1.65 output
128K
A large hybrid model that supports both thinking and non-thinking modes via prompt templates.
OpenAI GPT OSS 20B
Text
Aug 2025
$0.05 input
/
$0.20 output
131K
Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.
OpenAI GPT OSS 120B
Text
Aug 2025
$0.15 input
/
$0.60 output
131K
Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.
Qwen3 30B A3B
Text
Jul 2025
$0.10 input
/
$0.30 output
262K
Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.
Qwen3 235B A22B-2507
Text
Jul 2025
$0.10 input
/
$0.10 output
262K
Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.
Qwen3 Coder 480B A35B
Text
Jul 2025
$1.00 input
/
$1.50 output
262K
Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.
Qwen3 235B A22B Thinking-2507
Text
Jul 2025
$0.10 input
/
$0.10 output
262K
High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.
OpenPipe Qwen3 14B Instruct
Text
Apr 2025
$0.05 input
/
$0.22 output
33K
An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.
Meta Llama 4 Scout
Text
Vision
Apr 2025
$0.17 input
/
$0.66 output
64K
Multimodal model integrating text and image understanding, ideal for visual tasks and combined analysis.
Microsoft Phi 4 Mini 3.8B
Text
Feb 2025
$0.08 input
/
$0.35 output
128K
Compact, efficient model ideal for fast responses in resource-constrained environments.
Meta Llama 3.3 70B
Text
Dec 2024
$0.71 input
/
$0.71 output
128K
Multilingual model excelling in conversational tasks, detailed instruction-following, and coding.