Skip to main content

Tutorial: Running inference with DeepSeek R1-0528 using W&B Inference

Getting set up and running DeepSeek R1-0528, DeepSeek's advanced long-context language model, in Python using W&B Inference.
Created on September 2|Last edited on September 9
Get up and running with DeepSeek R1-0528, DeepSeek’s long-context reasoning model, using Python and W&B Inference powered by CoreWeave. This tutorial walks you through setup, basic usage, and more advanced configurations while showing how to capture everything with W&B Weave.
Whether you’re summarizing huge documents, building multilingual apps, or tackling complex reasoning, you’ll find the essentials here for running DeepSeek R1-0528 effectively via W&B Inference.

Table of contents



What is DeepSeek R1-0528?

DeepSeek R1-0528 is a large language model from DeepSeek designed for multimodal reasoning and very long context windows. Highlights include:
🧮 Mathematical reasoning: Scores 91.4% on AIME 2024 and 87.5% on AIME 2025, with nearly twice the reasoning token depth of the prior R1 release.
🧑‍💻 Coding performance: Strong showing on LiveCodeBench (73.3%) and Aider-Polyglot (71.6%), narrowing the gap with top closed models—useful for software engineering workflows.
📊 General benchmarks: Exceeds 81% on GPQA-Diamond and 85%+ on MMLU-Pro, indicating solid broad reasoning and knowledge.
⚙️ Architecture and inference: Supports up to 161K tokens of context, includes post-training algorithmic improvements, and offers FP4 quantized variants for efficient deployment with minimal quality impact.
New capabilities: Lower hallucination rates, structured JSON output, function calling, and smoother integrations for coding assistants and reasoning-heavy apps.


For detailed technical specifications and performance benchmarks, visit the DeepSeek R1-0528 model documentation.

W&B Weave

W&B Weave makes it simple to track and analyze your model calls. Start by importing Weave and initializing it with your project name.
A standout feature is the @weave.op decorator. In Python, decorators extend a function’s behavior. Adding @weave.op above a function instructs Weave to log that function’s inputs and outputs automatically. This keeps a clear record of what went in and what came out.
After your code runs, you’ll see these logs in the Weave dashboard with visualizations and call traces. This streamlines debugging and helps organize experiments—especially useful when iterating on models like DeepSeek R1-0528.

Tutorial: Running inference with DeepSeek R1-0528 using W&B Inference

Let’s jump in. The examples below use a Jupyter notebook (as you may notice in a few screenshots), but the code works in other environments too.
We will be running inference with the deepseek-ai/DeepSeek-R1-0528 model specifically.
If you're not familiar with Jupyter Notebooks, you can get set up in about 5 minutes. I walk you through it in this tutorial.
💡

Prerequisites

Before you begin, make sure you have:
  • A Weights & Biases account (you can sign up free here)
  • Python 3.7 or higher installed
  • Basic familiarity with Python and API usage
  • Understanding of your use case requirements (document analysis, code review, multilingual tasks, etc.)

Step 1: Installation & setup

1. Install required packages

To run inference with DeepSeek R1-0528, install OpenAI and Weave. We’ll also show how Weave helps compare outputs quickly and consistently.
The code to do this is:
pip install openai wandb weave
Run this in your terminal or a Jupyter cell.
When the cell executes, you’ll see an asterisk ([*]) inside the brackets—this means it’s running. Wait for it to change to a number before moving on.

2. Get your W&B API key

  1. Copy your API key
  2. Keep it handy for the next step

Step 2: Environment configuration

Setting environment variables keeps things secure and convenient. You’ll need your W&B API key.

Option 1: In a Jupyter Notebook

# Set environment variables in your notebook
%env WANDB_API_KEY=your-wandb-api-key-here

Option 2: In Terminal/Shell

export WANDB_API_KEY="your-wandb-api-key-here"

Option 3: In Python script

import os
# Set environment variables programmatically
os.environ["WANDB_API_KEY] = "your-wandb-api-key-here"

Step 3: Running basic inference with DeepSeek R1-0528

If everything’s set up, here’s where it gets fun.
Use this minimal example to create a completion with DeepSeek R1-0528:
import os
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are a helpful assistant specialized in clear explanations."},
{"role": "user", "content": "Explain quantum computing in simple terms."}
],
temperature=0.7,
max_tokens=1000,
)

print(resp.choices[0].message.content)
You'll find the inputs and outputs recorded to your Weave dashboard with the parameters automatically included:


Step 4: Advanced DeepSeek R1-0528 inference configuration

Understanding inference parameters

Tweak DeepSeek R1-0528’s responses using these common parameters (experiment and compare outcomes in Weave!).
import os
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are a creative writing assistant."},
{"role": "user", "content": "Write a short story about time travel."}
],
temperature=0.8,
top_p=0.9,
max_tokens=2000,
)
print(resp.choices[0].message.content)

Parameter Guidelines:
  • Temperature: Use 0.1-0.3 for analytical tasks, 0.7-0.9 for creative work
  • Top_p: Combine with temperature; 0.9 works well for most applications
This gives us added flexibility to influence our model output. These parameters are also automatically logged to W&B Weave for observability:


Streaming inference responses

For real-time output and better user experience:
import os
import sys
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

stream = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Tell me a comprehensive story about space exploration."}
],
stream=True,
temperature=0.7,
)

sys.stdout.write("Response: ")
for chunk in stream:
delta = chunk.choices[0].delta
if delta and delta.content:
sys.stdout.write(delta.content)
sys.stdout.flush()
print()

We got a streaming response:

With the metrics logged to Weave:

As well as the full output:


Step 5: Running inference with DeepSeek R1-0528's unique capabilities

This is where DeepSeek R1-0528 stands out. Let’s try a few capabilities.

Long context inference

DeepSeek R1-0528 handles very large documents. For example:
import os
import io
import requests
import openai
import weave
from pypdf import PdfReader

PROJECT = "wandb_inference"
weave.init(PROJECT)

PDF_URL = "https://docs.aws.amazon.com/pdfs/bedrock-agentcore/latest/devguide/bedrock-agentcore-dg.pdf"
QUESTION = "How does AgentCore memory work?"

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

r = requests.get(PDF_URL, timeout=60)
r.raise_for_status()

reader = PdfReader(io.BytesIO(r.content))
pages = reader.pages[:100]
text = "\n\n".join(page.extract_text() or "" for page in pages)

doc_snippet = text

prompt = (
"You analyze AWS Bedrock AgentCore docs and answer using only the provided text. "
"If something is not in the text, say you cannot find it.\n\n"
f"Document:\n{doc_snippet}\n\nQuestion: {QUESTION}\n"
"Cite exact phrases where possible."
)

resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are an expert on AWS Bedrock AgentCore."},
{"role": "user", "content": prompt}
],
temperature=0.2,
max_tokens=1500,
)

print(resp.choices[0].message.content)
Which outputs to Weave:


Multilingual inference

Leverage DeepSeek R1-0528's multilingual inference capabilities for international development:
import os
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)



code_snippet = """
// Chinese identifiers with English comments
function 计算总价(商品列表, 折扣率) {
let 总价 = 0;
for (const 商品 of 商品列表) {
总价 += 商品.价格 * 商品.数量
}
const 折扣金额 = 总价 * 折扣率
return 总价 - 折扣金额
}

# Python with Chinese docstring
def validate_用户输入(user_data):
'''
验证用户输入数据的完整性和有效性
Validates user input data for completeness and validity
'''
required_fields = ['name', 'email', '年龄']
for field in required_fields:
if field not in user_data:
raise ValueError(f"Missing required field: {field}")
return True
"""

task = (
"Explain in English what the code does and provide a concise Chinese explanation. "
"Then suggest improvements for naming consistency and error handling. "
"Provide a refactored version using one language for identifiers."
)

resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are a senior engineer fluent in English and Chinese."},
{"role": "user", "content": f"{task}\n\nCode:\n{code_snippet}"}
],
temperature=0.2,
max_tokens=1200,
)

print(resp.choices[0].message.content)

Which logs to Weave as:


Complex multi-step reasoning inference with DeepSeek R1-0528

Utilize DeepSeek R1-0528's inference reasoning capabilities for complex problem solving:
import os
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You are a helpful assistant specialized in SaaS growth strategy."},
{"role": "user", "content": """
Our SaaS company is experiencing a 15% monthly churn rate. Our customer acquisition cost (CAC) is $150,
average customer lifetime value (CLV) is $800, and monthly recurring revenue per customer is $50.
We have 2,000 active customers and are spending $60,000/month on marketing.
Please analyze this situation and provide a comprehensive strategy to improve our metrics,
including specific actions, expected timelines, and success metrics.
"""}
],
temperature=0.7,
max_tokens=1000,
)

print(resp.choices[0].message.content)
Which you'll see in the dashboard:


Monitoring DeepSeek R1-0528 inference with W&B Weave

From the final cell, you can view the model output and copy it as needed. To dig deeper, or to review previous requests, open your Weights & Biases project or follow the links printed with the response.
With Weave initialized, your inference calls are tracked automatically. Here’s what gets recorded and how to use it:

What Weave tracks automatically

  • Request details: Model used, parameters, token counts
  • Response data: Content, processing time, success/failure status
  • Usage metrics: Token consumption, API costs, rate limit status
  • Performance: Response latency, throughput patterns

Accessing your logs

  • Visit your W&B project dashboard at: https://wandb.ai/[your-username]/[your-project]
  • Navigate to the "Weave" section
  • View detailed logs, filter by date/model/status
  • Analyze usage patterns and optimize accordingly

Custom Weave annotations

Add custom metadata and organize your API calls:
import os
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)

@weave.op()
def analyze_customer_feedback(feedback_text, sentiment_threshold=0.5):
"""
Analyze feedback and return sentiment summary.
Tracked via weave.op since the OpenAI client has no built-in weave hook.
"""
resp = client.chat.completions.create(
model="deepseek-ai/DeepSeek-R1-0528",
messages=[
{"role": "system", "content": "You score sentiment from -1 to 1 and list key topics."},
{"role": "user", "content": f"Feedback: {feedback_text}\nThreshold: {sentiment_threshold}"}
],
temperature=0.1,
max_tokens=500,
)
return resp.choices[0].message.content

if __name__ == "__main__":
out = analyze_customer_feedback(
"The new update is confusing and slow. I cannot find the features I used daily.",
sentiment_threshold=0.3,
)
print(out)
Which would appear as:


Best Practices

Here are some best practices to follow when testing and/or deploying DeepSeek R1-0528, or any other model for that matter.

Security and Configuration

  • Environment variables: Always store API keys in environment variables, never hardcode them
  • Project organization: Use clear, descriptive project names following the "team/project" format
  • Access control: Limit API key permissions to necessary scopes only

Prompt Engineering for DeepSeek R1-0528

  • Leverage long context: Don't hesitate to provide extensive context - DeepSeek R1-0528 handles it well
  • Clear instructions: Be specific about the desired output format and style
  • System messages: Use detailed system prompts to establish expertise and context
  • Temperature selection: Lower values (0.1-0.3) for analytical tasks, higher (0.7-0.9) for creative work

Performance Optimization

  • Streaming: Use streaming for longer responses to improve user experience
  • Batch processing: Group similar requests when possible to improve efficiency
  • Token management: Monitor token usage to optimize costs and stay within limits
  • Caching: Implement response caching for frequently requested analyses

Monitoring and Debugging

  • Weave integration: Use Weave's automatic logging for all production calls
  • Custom annotations: Add meaningful metadata to track different use cases
  • Error analysis: Regularly review failed requests to identify patterns
  • Performance tracking: Monitor response times and adjust parameters accordingly

Next steps

Now that you're equipped with comprehensive DeepSeek R1-0528 knowledge:
🔗 Explore Advanced Features
📊 Optimize Your Workflow
  • Set up automated monitoring dashboards for your specific use cases
  • Implement A/B testing between different prompting strategies
  • Create custom evaluation metrics for your domain-specific tasks
🚀 Scale Your Implementation
  • Build production pipelines with proper error handling and monitoring
  • Implement cost optimization strategies based on usage patterns
  • Explore integration with other W&B tools for end-to-end ML workflows
📚 Dive Deeper into DeepSeek R1-0528
  • Visit the DeepSeek R1-0528 model card for detailed capability information
  • Explore community examples and use cases
  • Stay updated with model improvements and new features
With this comprehensive setup, you're ready to harness DeepSeek R1-0528's advanced capabilities while maintaining professional-grade monitoring, logging, and error handling through W&B Inference and Weave.
Iterate on AI agents and models faster. Try Weights & Biases today.