DeepSeek V4-Flash inference overview

Price per 1M tokens

$0.01 (input)
$0.01 (output)

Parameters

13B (active)
284B (total)

Context Window

1M

Release Date

Apr 2026

DeepSeek V4-Flash inference details

DeepSeek V4-Flash is an MoE model with 1M context length. Its smaller size in the DeepSeek V4 family makes it ideal for fast and efficient coding, reasoning, and agentic workloads.

Created by: 

DeepSeek

License: 

mit

Model card: 

				
					import openai
import weave

# Weave autopatches OpenAI to log LLM calls to W&B
weave.init("<team>/<project>")

client = openai.OpenAI(
    # The custom base URL points to W&B Inference
    base_url='https://api.inference.wandb.ai/v1',

    # Get your API key from https://wandb.ai/authorize
    # Consider setting it in the environment as OPENAI_API_KEY instead for safety
    api_key="<your-apikey>",

    # Optional: Team and project for usage tracking
    project="<team>/<project>",
)

response = client.chat.completions.create(
    model="deepseek-ai/DeepSeek-V4-Flash",
    messages=[
        {"role": "system", "content": "You are a helpful assistant."},
        {"role": "user", "content": "Tell me a joke."}
    ],
)

print(response.choices[0].message.content)
				
			

DeepSeek V4-Flash resources

Screenshot 2025-07-30 at 1.03.14 PM
Course
AI engineering course: Agents
Inference_logo
Guide
W&B Inference powered by CoreWeave
Screenshot 2025-07-30 at 8.00.14 AM
Whitepaper
A primer on building successful AI agents