Tutorial: Run inference with Qwen3 235B A22B-2507 Instruct using W&B Inference

Getting set up and running Qwen3 235B A22B-2507 Instruct, Qwen's large language model, in Python using W&B Inference.
Created on September 11|Last edited on September 15
Comment
Running inference with Qwen3 235B A22B-2507 Instruct through W&B Inference powered by CoreWeave is quick to set up yet powerful enough for sophisticated use cases. In this tutorial, you’ll see how to set up the model, run inference, and make use of advanced features, while also tracking and troubleshooting your experiments with W&B Weave.
Whether you’re working with long documents, building multilingual systems, or tackling complex reasoning tasks, this guide gives you the tools to use Qwen3 235B A22B-2507 Instruct effectively in your workflow.
Table of contentsWhat is Qwen3 235B A22B-2507 Instruct?W&B WeaveTutorial: Running inference with Qwen3 235B A22B-2507 Instruct using W&B InferencePrerequisitesStep 1: Installation & setupStep 2: Environment configurationStep 3: Running basic inference with Qwen3 235B A22B-2507 InstructStep 4: Advanced Qwen3 235B A22B-2507 Instruct inference configurationRunning inference with Qwen3 235B A22B-2507 Instruct's unique capabilitiesMonitoring  Qwen3 235B A22B-2507 Instruct inference with W&B WeaveBest Practices🔐 Security & Configuration✍️ Prompt Engineering⚡ Performance Optimization📊 Monitoring & DebuggingNext steps
﻿
What is Qwen3 235B A22B-2507 Instruct?Qwen3 235B A22B-2507 Instruct is a large-scale language model from Qwen, designed to push state-of-the-art performance across knowledge, reasoning, and coding while maintaining strong alignment.
📚 On academic knowledge tasks, it dominates factual QA. It leads GPQA at 77.5%, SuperGPQA at 62.6%, SimpleQA at 54.3%, and CSimpleQA at 84.3%, outperforming Claude, GPT-4o, Deepseek, Kimi, and the non-instruct Qwen variant. On MMLU it scores 83.0 and 93.1 (Redux), just behind Claude Opus but ahead of most others.
➗ In reasoning-heavy benchmarks, it shows unique strengths. On AIME2,5 it reaches 70.3%, far above all but the strongest competitors. It also achieves 55.4% on HMMT25 and 41.8% on ARC-AGI, where GPT-4o and Claude struggle. On ZebraLogic, it stands out with 95.0, the best score recorded.
🖥️ For coding, it proves capable but not dominant. On LiveCodeBench v6 it records 51.8%, ahead of GPT-4o and Deepseek. On MultiPL-E it scores 87.9%, and on Aider-Polyglot 57.3%, showing balanced multilingual coding ability.
✍️ In alignment and creative writing, it performs at a high tier. IF Eval comes in at 88.7, Creative Writing v3 at 87.5, and WritingBench at 85.2. On Arena-Hard v2 it posts 79.2%, comfortably ahead of GPT-4o, Claude, and Deepseek, demonstrating both reasoning depth and user preference wins.
🌍 Overall, Qwen3 235B A22B-2507 Instruct establishes itself as one of the strongest publicly benchmarked models. It trails Claude Opus 4 slightly on MMLU, but otherwise leads in QA, reasoning, and logic-heavy tests, making it one of the best general-purpose LLMs available today.
﻿
﻿
For detailed technical specifications and performance benchmarks, visit the Qwen3 235B A22B-2507 Instruct model documentation.﻿﻿
W&B Weave﻿W&B Weave goes beyond simple logging; it organizes and visualizes your model runs so you can debug, compare, and refine more effectively.
Getting started is easy: just import the library and initialize it with your project name.
One notable feature is the @weave.op decorator. In standard Python, functions execute without capturing their inputs or outputs. By using @weave.op, each function call is logged automatically, eliminating the need to create your own logging tools or clutter notebooks with print statements.
All logs appear in the Weave dashboard, where you can:
View interactive visualizations, timelines, and traces of function calls
Drill into details and compare different runs
Trace outputs back to inputs for reproducibility
This turns Weave into a strong asset for model development. Rather than dealing with scattered log files, you gain a unified visual record of your experiments, making it easier to debug, replicate results reliably, and fine-tune models such as Qwen3 235B A22B-2507 Instruct with fewer obstacles.
Tutorial: Running inference with Qwen3 235B A22B-2507 Instruct using W&B InferenceWe’ll be using the Qwen/Qwen3-235B-A22B-Instruct-2507 model. The examples here assume you’re running inside a Jupyter Notebook, though the code works in any Python environment.
If you’re new to Jupyter, setup takes about five minutes.
If you're not familiar with Jupyter Notebooks, you can get set up in about 5 minutes. I walk you through it in this tutorial.
💡
PrerequisitesBefore starting, ensure you have:
A Weights & Biases account (you can sign up free here)
Python 3.7 or higher installed
Basic familiarity with Python and API usage
Understanding of your use case requirements (document analysis, code review, multilingual tasks, etc.)
Step 1: Installation & setup
1. Install required packagesTo get started running inference with Qwen3 235B A22B-2507 Instruct, all you need to install is OpenAI and Weave. We’ll also show you how to streamline the review of multiple outputs with W&B Weave, making the process far more efficient.
The code to do this is:
pip install openai wandb weave
Run this in your terminal or Jupyter cell after entering this code.
When you execute the cell, you'll notice an asterisk ([*]) appear between the brackets [ ]. This indicates that the cell is running, and you'll need to wait until the asterisk turns into a number before proceeding.
2. Get your W&B API keyVisit https://wandb.ai/authorize﻿
Copy your API key
Keep it handy for the next step
Step 2: Environment configurationSet your W&B API key as an environment variable. Choose the method that fits your workflow:
Option 1: In a Jupyter Notebook# Set environment variables in your notebook
%env WANDB_API_KEY=your-wandb-api-key-here
Option 2: In Terminal/Shellexport WANDB_API_KEY="your-wandb-api-key-here"
Option 3: In Python scriptimport os
# Set environment variables programmatically
os.environ["WANDB_API_KEY] = "your-wandb-api-key-here"
Step 3: Running basic inference with Qwen3 235B A22B-2507 InstructHere’s a simple example of running inference with Qwen3 235B A22B-2507 Instruct.
import os
import openai
import weave
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
resp = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a clear, friendly science explainer."},
        {"role": "user", "content": "In plain language, outline the core ideas behind quantum computing—qubits, superposition, entanglement, and gates. Give one real-world application and one current limitation."}
    ],
    temperature=0.6,
    max_tokens=900,
)
﻿
print(resp.choices[0].message.content)
You'll find the inputs and outputs recorded to your Weave dashboard with the parameters automatically included:
﻿
Step 4: Advanced Qwen3 235B A22B-2507 Instruct inference configuration
Understanding inference parametersYou can adjust Qwen3 235B A22B-2507 Instruct’s response behavior using these inference parameters and compare the results in Weave.
import os
import openai
import weave
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }    
)
﻿
resp = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a creative writing coach with a focus on sci‑fi tone and pacing."},
        {"role": "user", "content": "Draft a tight 600–800 word time‑travel story with strong characters, a twist, and a reflective note on cause/effect. Keep prose lean and vivid."}
    ],
    temperature=0.75,
    top_p=0.9,
    max_tokens=1600,
)
print(resp.choices[0].message.content)
﻿
Parameter Guidelines:
Temperature: Use 0.1-0.3 for analytical tasks, 0.7-0.9 for creative work
Top_p: Combine with temperature; 0.9 works well for most applications
We also show how to stream responses for a more interactive experience, which is ideal for chatbots or applications with long outputs.
﻿
Streaming inference responsesFor real-time output and better user experience:
import os
import sys
import openai
import weave
﻿
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
stream = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a space history narrator with a balanced, insightful tone."},
        {"role": "user", "content": "Tell the story of space exploration—from early dreams to present—touching on key missions, innovators, international cooperation, big challenges, and why it matters."}
    ],
    stream=True,
    temperature=0.6,
)
﻿
sys.stdout.write("Response: ")
for chunk in stream:
    delta = chunk.choices[0].delta
    if delta and delta.content:
        sys.stdout.write(delta.content)
        sys.stdout.flush()
print()
﻿
We got a streaming response:
﻿
With the metrics logged to Weave:
﻿
As well as the full output:
﻿
Running inference with Qwen3 235B A22B-2507 Instruct's unique capabilitiesQwen3 235B A22B-2507 Instruct shines in a few specialized areas:
Long context inferenceQwen3 235B A22B-2507 Instruct excels at running inference on extensive documents. Here's a practical example:
import io
import requests
import openai
import weave
from pypdf import PdfReader
import os
﻿
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
PDF_URL = "https://docs.aws.amazon.com/pdfs/bedrock-agentcore/latest/devguide/bedrock-agentcore-dg.pdf"
QUESTION = "Summarize how AgentCore's memory architecture functions and when to use it."
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
r = requests.get(PDF_URL, timeout=60)
r.raise_for_status()
﻿
reader = PdfReader(io.BytesIO(r.content))
pages = reader.pages[:100]
text = "\n\n".join(page.extract_text() or "" for page in pages)
﻿
doc_snippet = text
﻿
prompt = (
    f"Using the provided AWS Bedrock AgentCore doc, answer: {QUESTION}\n\n"
    f"Documentation:\n{doc_snippet}\n\n"
    "Cite quotes where relevant."
)
﻿
resp = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You analyze the given text only. If info is missing, say so."},
        {"role": "user", "content": prompt}
    ],
    temperature=0.25,
    max_tokens=1400,
)
﻿
print(resp.choices[0].message.content)
﻿
Which outputs to Weave:
﻿
Multilingual inferenceTake advantage of Qwen3 235B A22B-2507 Instruct’s multilingual inference abilities to support global development:
import os
import openai
import weave
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
code_snippet = """
// JavaScript function with Chinese identifiers and mixed language comments
function 计算总价(商品列表, 折扣率) {
    let 总价 = 0;  // Total price accumulator
    for (const 商品 of 商品列表) {
        总价 += 商品.价格 * 商品.数量  // price * quantity
    }
    const 折扣金额 = 总价 * 折扣率  // discount amount
    return 总价 - 折扣金额  // final price
}
﻿
# Python with Chinese docstring and mixed naming
def validate_用户输入(user_data):
    '''
    验证用户输入数据的完整性和有效性
    Validates user input data for completeness and validity
    '''
    required_fields = ['name', 'email', '年龄']
    for field in required_fields:
        if field not in user_data:
            raise ValueError(f"Missing required field: {field}")
    return True
"""
﻿
task = (
    "Analyze this code: 1) Explain in English+Chinese 2) Flag issues 3) Suggest improvements 4) Refactor with consistent naming."
)
﻿
resp = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a bilingual software architect (EN/中文)."},
        {"role": "user", "content": f"{task}\n\nCode:\n{code_snippet}"}
    ],
    temperature=0.2,
    max_tokens=1100,
)
﻿
print(resp.choices[0].message.content)
﻿
Which logs to Weave as:
﻿
Complex multi-step reasoning inference with Qwen3 235B A22B-2507 InstructApply Qwen3 235B A22B-2507 Instruct’s reasoning-driven inference to tackle complex problem solving:
import openai
import weave
import os
﻿
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
resp = client.chat.completions.create(
    model="Qwen/Qwen3-235B-A22B-Instruct-2507",
    messages=[
        {"role": "system", "content": "You are a pragmatic SaaS strategist focusing on retention and efficient growth."},
        {"role": "user", "content": "Given churn 15%, CAC $150, CLV $800, ARPU $50, 2k customers, $60k marketing—provide: benchmarks check, top 3 problems, action plan, 90‑day timeline, and success KPIs."}
    ],
    temperature=0.65,
    max_tokens=900,
)
﻿
print(resp.choices[0].message.content)
Which you'll see in the dashboard:
﻿
Monitoring  Qwen3 235B A22B-2507 Instruct inference with W&B WeaveOnce initialized, Weave automatically logs all inference API calls. You’ll have access to:
Request details: model, parameters, token counts
Response data: outputs, runtime, status
Usage metrics: tokens consumed, costs, rate limits
Performance: latency and throughput patterns
You can access your logs in the W&B dashboard, filter by run, and analyze patterns. Adding custom annotations helps organize logs by use case or experiment.
Custom Weave annotationsAdd custom metadata and organize your API calls:
import os
import openai
import weave
﻿
PROJECT = "wandb_inference"
weave.init(PROJECT)
﻿
client = openai.OpenAI(
    base_url="https://api.inference.wandb.ai/v1",
    api_key=os.getenv("WANDB_API_KEY"),
    project=PROJECT,
    default_headers={
        "OpenAI-Project": "wandb_fc/quickstart_playground"  # replace with your actual team/project
    }
)
﻿
@weave.op()
def analyze_customer_feedback(feedback_text, sentiment_threshold=0.5):
    """Return sentiment (‑1..+1), themes, risks, and next actions."""
    resp = client.chat.completions.create(
        model="Qwen/Qwen3-235B-A22B-Instruct-2507",
        messages=[
            {"role": "system", "content": "You assess customer feedback for sentiment and themes; be concise and actionable."},
            {"role": "user", "content": f"Analyze: {feedback_text}\n\nThreshold: {sentiment_threshold}. Return: score, themes, concerns, actions."}
        ],
        temperature=0.1,
        max_tokens=500,
    )
    return resp.choices[0].message.content
﻿
if __name__ == "__main__":
    out = analyze_customer_feedback(
        "Core features are solid, but the latest UI update slowed things down and added friction.",
        sentiment_threshold=0.3,
    )
    print(out)
Which would appear as:
﻿
Best Practices
🔐 Security & ConfigurationKeep API keys in environment variables rather than embedding them in code.
Choose clear and descriptive project names (for example, team/project).
Limit API key permissions to only what is required.
✍️ Prompt EngineeringMake use of Qwen3 235B A22B-2507 Instruct’s extended context support.
Define the output format and style you want.
Provide thorough system messages to guide context and tone.
Tune the temperature setting: lower for analysis, higher for creativity.
⚡ Performance OptimizationTurn on streaming for lengthy outputs.
Group similar requests together to reduce time and cost.
Track token usage to maintain efficiency.
Reuse results by caching frequent queries.
📊 Monitoring & DebuggingRely on Weave’s automatic logging for every production call.
Attach metadata annotations to keep experiments organized.
Check failed requests often to spot problems early.
Monitor latency and fine-tune configurations to maintain stable performance.
Next stepsNow that you’ve mastered the basics of Qwen3 235B A22B-2507 Instruct:
🔗 Explore advanced features → Review W&B Inference docs and experiment with Weave’s evaluation tools.
📊 Optimize workflows → Create monitoring dashboards, conduct A/B testing for prompts, and develop metrics tailored to your domain.
🚀 Scale deployments → Set up reliable production pipelines, reduce costs through optimization, and connect with other W&B tools.
📚 Deepen your knowledge → Review the Qwen3 235B A22B-2507 Instruct Model Card, look through community examples, and keep up with the latest updates.
﻿
Add a comment
Tags: Articles, Inference, LLM, Weave
Iterate on AI agents and models faster. Try Weights & Biases today.