MiniMax M2.5

MiniMax M2.5 inference overview

Price per 1M tokens

$0.30 (input)

$1.20 (output)

Parameters

10B (active)

230B (total)

Context Window

197K

Release Date

Feb 2026

MiniMax M2.5 inference details

MiniMax M2.5 is a Mixture-of-Experts (MoE) model featuring 230 billion total parameters and 10 billion active parameters per token during inference. This highly sparse architecture allows for high-throughput and low-latency with strong coding capabilities.

Created by:

MiniMax

License:

other

Model card:

MiniMax-M2.5

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Optional: Team and project for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="MiniMaxAI/MiniMax-M2.5",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)

MiniMax M2.5 resources

Course

AI engineering course: Agents

Guide

W&B Inference powered by CoreWeave

Whitepaper

A primer on building successful AI agents

MiniMax M2.5 inference overview

Price per 1M tokens

Parameters

Context Window

Release Date

MiniMax M2.5 inference details

MiniMax M2.5 resources

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

MiniMax M2.5 inference overview

Price per 1M tokens

Parameters

Context Window

Release Date

MiniMax M2.5 inference details

MiniMax M2.5 resources

The Platform

Article

Resources

Company

Use cases

Industries