Deepseek V3.1 on W&B Inference
Price per 1M tokens
$0.55 (input)
$1.65 (output)
Parameters
37B (Active)
671B (Total)
Context window
128K
Release date
Aug 2025
Deepseek V3.1 inference details
DeepSeek-V3.1 is a large hybrid reasoning model (671B parameters, 37B active) that supports both thinking and non-thinking modes via prompt templates. It extends the DeepSeek-V3 base with a two-phase long-context training process, reaching up to 128K tokens, and uses FP8 microscaling for efficient inference. The model improves code generation, and reasoning efficiency, achieving performance comparable to DeepSeek R1-0528 on difficult benchmarks while responding more quickly. It is suitable for research, coding, and agentic workflows. It succeeds the DeepSeek V3-0324 model and performs well on a variety of tasks.
Created by: DeepSeek
License: mit
🤗 model card: DeepSeek-V3.1
import openai
import weave
# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("/")
client = openai.OpenAI(
# The custom base URL points to W&B Inference
base_url='https://api.inference.wandb.ai/v1',
# Get your API key from https://wandb.ai/authorize
# Consider setting it in the environment as OPENAI_API_KEY instead for safety
api_key="",
# Team and project are required for usage tracking
project="/",
)
response = client.chat.completions.create(
model="deepseek-ai/DeepSeek-V3.1",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a joke."}
],
)
print(response.choices[0].message.content)