TESTING 250729 - new inference models grid

On-demand webinar

From physics to intelligence: Accelerating the design of next‑generation products

Demo video

Getting started with Weights & Biases for robotics

Available models

W&B Inference powered by CoreWeave provides API and playground access to leading open-source LLMs, including OpenAI GPT OSS, Qwen3, Kimi K2, Llama 4, DeepSeek, and Phi, allowing Weights & Biases users to develop AI applications and agents without needing to sign up for a hosting provider or host models themselves

Z.AI GLM 5.2

Text

Jun 2026

$0.76 input

$0.14 cached

$2.42 output

262K

GLM-5.2 is an MoE language model featuring 40 billion activated parameters and a total of 744 billion parameters

MiniMax M3

Vision

Jun 2026

$0.23 input

$0.05 cached

$0.96 output

262K

MiniMax M3 is a multimodal MoE model with 23B active parameters optimized for coding and agentic workflows.

Moonshot AI Kimi K2.7 Code

Vision

Jun 2026

$0.71 input

$0.15 cached

$3.50 output

262K

A 1T-parameter MoE model with 32B active parameters built for long-horizon agentic coding and software engineering

NVIDIA Nemotron 3 Ultra

Jun 2026

$0.75 input

$0.15 cached

$2.75 output

262K

A powerful MoE model designed for long-running agents across coding, deep research, and enterprise automation.

JetBrains Mellum2 12B A2.5B

New

Jun 2026

$.05 input

$.10 output

131K

Mellum2-12B-A2.5B-Instruct is a fast MoE model with 131K context built for coding, tool use, and low-latency AI workflow

IBM Granite 4.1 8B

Apr 2026

$.05 input

$.10 output

131K

Granite 4.1 8B is a long-context instruct model capable of enhanced tool calling, instruction following, and chat

DeepSeek V4-Pro

Text

Apr 2026

$1.74 input

$0.14 cached

$3.46 output

DeepSeek V4-Pro is a 1.6T-parameter MoE model with 49B active parameters excelling at advanced reasoning and coding

DeepSeek V4-Flash

Experimental

Apr 2026

$0.14 input

$0.07 cached

$0.28 output

DeepSeek V4-Flash is an MoE model with 1M context length great for coding, reasoning, and agentic workloads.

Qwen3.6 27B

Vision

New

Apr 2026

$0.60 input

$0.12 cached

$3.60 output

262K

Qwen3.6-27B is a 27B dense multimodal model with 262K context built for flagship-level agentic coding.

Moonshot AI Kimi K2.6

Vision

Apr 2026

$0.65 input

$0.15 cached

$3.41 output

262K

Kimi K2.6 is a multimodal Mixture-of-Experts language model featuring 32 billion activated parameters

Z.AI GLM 5.1

Text

New

Apr 2026

$1.40 input

$0.26 cached

$4.40 output

203K

Powerful MoE model for long-horizon agentic engineering and advanced reasoning.

Google Gemma 4 31B

Text

Vision

Apr 2026

$0.10 input

$0.34 output

262K

Gemma 4 31B Dense is designed for advanced reasoning, agentic workflows, and longer context and is natively trained on 1

Qwen3.6 35B A3B

Vision

Apr 2026

$0.25 input

$1.25 output

262K

Qwen3.6-35B-A3B is an MoE multimodal model with 262K context optimized for agentic coding workflows.

NVIDIA Nemotron 3 Super 120B

Text

Mar 2026

$0.20 input

$0.80 output

262K

Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.

Qwen3.5 27B

Vision

Experimental

Feb 2026

$0.39 input

$0.08 cached

$3.12 output

262K

Qwen3.5-27B is a dense model from the Qwen3.5 family built for high performance across a large range of benchmarks.

Qwen3.5 35B A3B

Text

Vision

Feb 2026

$0.25 input

$1.25 output

262K

Qwen3.5-35B-A3B is an open-weights multimodal MoE model built for efficient, high-throughput inference across chat, reasoning, and agentic tasks.

MiniMax M2.5

Text

Feb 2026

$0.30 input

$1.20 output

197K

MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities

Moonshot AI Kimi K2.5

Text

Vision

Jan 2026

$0.60 input

$0.10 cached

$3.00 output

262K

Multimodal MoE language model featuring 32 billion activated parameters and a total of 1 trillion parameters

Deepseek V3.1

Text

Aug 2025

$0.55 input

$1.65 output

128K

A large hybrid model that supports both thinking and non-thinking modes via prompt templates.

OpenAI GPT OSS 20B

Text

Aug 2025

$0.05 input

$0.20 output

131K

Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.

OpenAI GPT OSS 120B

Text

Aug 2025

$0.03 input

$0.17 output

131K

Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.

Qwen3 30B A3B

Text

Jul 2025

$0.10 input

$0.30 output

262K

Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.

Qwen3 235B A22B-2507

Text

Jul 2025

$0.10 input

$0.10 output

262K

Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.

Qwen3 Coder 480B A35B

Text

Jul 2025

$1.00 input

$1.50 output

262K

Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.

Qwen3 235B A22B Thinking-2507

Text

Jul 2025

$0.10 input

$0.10 output

262K

High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.

OpenPipe Qwen3 14B Instruct

Text

Apr 2025

$0.05 input

$0.22 output

33K

An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.

Microsoft Phi 4 Mini 3.8B

Text

Feb 2025

$0.08 input

$0.35 output

128K

Compact, efficient model ideal for fast responses in resource-constrained environments.

Meta Llama 3.3 70B

Text

Dec 2024

$0.71 input

$0.71 output

128K

Multilingual model excelling in conversational tasks, detailed instruction-following, and coding.

Meta Llama 3.1 70B

Text

Jul 2024

$0.80 input

$0.80 output

128K

Efficient conversational model optimized for responsive multilingual chatbot interactions.

Meta Llama 3.1 8B

Text

Jul 2024

$0.22 input

$0.22 output

128K

Efficient conversational model optimized for responsive multilingual chatbot interactions.

Available models

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

Available models

The Platform

Article

Resources

Company

Use cases

Industries