TESTING 250729 - new inference models grid

Guide

Kimi K2 for code generation tutorial

Guide

DeepSeek-R1 vs OpenAI o1 comparison

Available models

W&B Inference powered by CoreWeave provides API and playground access to leading open-source LLMs, including OpenAI GPT OSS, Qwen3, Kimi K2, Llama 4, DeepSeek, and Phi, allowing Weights & Biases users to develop AI applications and agents without needing to sign up for a hosting provider or host models themselves

NVIDIA Nemotron 3 Super 120B

Text

New

Mar 2026

$0.20 input

$0.80 output

262K

Nemotron 3 is a LatentMoE model designed to deliver strong agentic, reasoning, and conversational capabilities.

MiniMax M2.5

Text

New

Feb 2026

$0.30 input

$1.20 output

197K

MoE model with a highly sparse architecture designed for high-throughput and low latency with strong coding capabilities

Z.AI GLM 5

Text

New

Feb 2026

$1.00 input

$3.20 output

203K

Mixture-of-Experts model for long-horizon agentic tasks with strong performance on reasoning and coding.

Moonshot AI Kimi K2.5

Text

Vision

Jan 2026

$0.50 input

$2.85 output

262K

Multimodal Mixture-of-Experts language model featuring 32 billion activated parameters and a total of 1 trillion parameters.

Deepseek V3.1

Text

Aug 2025

$0.55 input

$1.65 output

128K

A large hybrid model that supports both thinking and non-thinking modes via prompt templates.

OpenAI GPT OSS 20B

Text

Aug 2025

$0.05 input

$0.20 output

131K

Lower latency Mixture-of-Experts model trained on OpenAI’s Harmony response format with reasoning capabilities.

OpenAI GPT OSS 120B

Text

Aug 2025

$0.15 input

$0.60 output

131K

Efficient Mixture-of-Experts model designed for high-reasoning, agentic and general-purpose use cases.

Qwen3 30B A3B

Text

Jul 2025

$0.10 input

$0.30 output

262K

Qwen3-30B-A3B-Instruct-2507 is a 30.5B MoE instruction-tuned model with enhanced reasoning, coding, and long-context understanding.

Qwen3 235B A22B-2507

Text

Jul 2025

$0.10 input

$0.10 output

262K

Efficient multilingual, Mixture-of-Experts, instruction-tuned model, optimized for logical reasoning.

Qwen3 Coder 480B A35B

Text

Jul 2025

$1.00 input

$1.50 output

262K

Mixture-of-Experts model optimized for agentic coding tasks such as function calling, tool use, and long-context reasoning.

Qwen3 235B A22B Thinking-2507

Text

Jul 2025

$0.10 input

$0.10 output

262K

High-performance Mixture-of-Experts model optimized for structured reasoning, math, and long-form generation.

OpenPipe Qwen3 14B Instruct

Text

Apr 2025

$0.05 input

$0.22 output

33K

An efficient multilingual, dense, instruction-tuned model, optimized by OpenPipe for building agents with finetuning.

Meta Llama 4 Scout

Text

Vision

Apr 2025

$0.17 input

$0.66 output

64K

Multimodal model integrating text and image understanding, ideal for visual tasks and combined analysis.

Microsoft Phi 4 Mini 3.8B

Text

Feb 2025

$0.08 input

$0.35 output

128K

Compact, efficient model ideal for fast responses in resource-constrained environments.

Meta Llama 3.3 70B

Text

Dec 2024

$0.71 input

$0.71 output

128K

Multilingual model excelling in conversational tasks, detailed instruction-following, and coding.

Meta Llama 3.1 70B

Text

Jul 2024

$0.80 input

$0.80 output

128K

Efficient conversational model optimized for responsive multilingual chatbot interactions.

Meta Llama 3.1 8B

Text

Jul 2024

$0.22 input

$0.22 output

128K

Efficient conversational model optimized for responsive multilingual chatbot interactions.

Available models

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

Available models

The Platform

Article

Resources

Company

Use cases

Industries