Skip to main content

Tutorial: Run inference with Qwen3-Coder using W&B Inference

Getting set up and running Qwen3-Coder, Qwen's large language model, in Python using W&B Inference.
Created on September 15|Last edited on September 22
Running inference with Qwen3-Coder (A.K.A Qwen3-Coder 480B A35B Instruct) through W&B Inference powered by CoreWeave is quick to set up yet powerful enough for sophisticated use cases. In this tutorial, you’ll see how to set up the model, perform inference, and make use of advanced features, while also tracking and troubleshooting your experiments with W&B Weave.
Whether you’re working with long documents, building multilingual systems, or tackling complex reasoning tasks, this guide gives you the tools to use Qwen3 Coder effectively in your workflow.

Table of contents



What is Qwen3-Coder

Qwen3-Coder is a large-scale code-specialized language model from Qwen, designed to deliver top-tier performance across software engineering, reasoning, and multilingual programming while staying tightly aligned with developer workflows.
🖥️ On coding and software engineering tasks, it establishes itself as one of the strongest open models. On SWE-Bench Verified with OpenHands it scores 67.0% (100 turns) and 69.6% (500 turns), competitive with Claude 4 Sonnet and ahead of GPT-4.1. On Aider-Polyglot it reaches 61.8%, surpassing Claude and GPT-4.1. In live settings, it posts 26.3% on SWE-Bench Live, ahead of Kimi and DeepSeek, and it records 54.7% on SWE-Bench Multilingual, outperforming most competitors.
➗ In reasoning-heavy software benchmarks, it shows balanced strength. It achieves 25.8% on Multi-SWE-bench mini and 27.0% on Multi-SWE-bench flash, narrowly edging out Claude Sonnet and far above DeepSeek. On Spider2, which tests complex database reasoning, it posts 31.1%, tied with Claude and nearly double GPT-4.1.
🌍 For multilingual programming, Qwen3-Coder demonstrates robust ability. Its 54.7% score on SWE-Bench Multilingual places it above Claude and well ahead of GPT-4.1, while its performance on Aider-Polyglot confirms its adaptability across diverse languages and coding environments.
📊 Overall, Qwen3-Coder delivers state-of-the-art open-model performance in software engineering, combining long-context reasoning, strong tool-use, and competitive multilingual coding ability. It rivals or surpasses closed models like Claude and GPT-4.1 in many coding benchmarks, making it one of the best developer-focused models available today.
For detailed technical specifications and performance benchmarks, visit the Qwen3-Coder model documentation.

W&B Weave

W&B Weave goes beyond simple logging; it organizes and visualizes your model runs so you can debug, compare, and refine more effectively.
Getting started is easy: just import the library and initialize it with your project name.
One notable feature is the @weave.op decorator. In standard Python, functions execute without capturing their inputs or outputs. By using @weave.op, each function call is logged automatically, eliminating the need to create your own logging tools or clutter notebooks with print statements.
All logs appear in the Weave dashboard, where you can:
  • View interactive visualizations, timelines, and traces of function calls
  • Drill into details and compare different runs
  • Trace outputs back to inputs for reproducibility
This makes Weave a powerful tool for model development. Instead of handling scattered log files, you get a unified visual record of your experiments, which simplifies debugging, ensures reproducible results, and helps fine-tune models like Qwen3-Coder with fewer hurdles.

Tutorial: Running inference with Qwen3-Coder using W&B Inference

We’ll be using the Qwen/Qwen3-Coder-480B-A35B-Instruct model. The examples here assume you’re running inside a Jupyter Notebook, though the code works in any Python environment.
If you’re new to Jupyter, setup takes about five minutes.
If you're not familiar with Jupyter Notebooks, you can get set up in about 5 minutes. I walk you through it in this tutorial.
💡

Prerequisites

Before starting, ensure you have:
  • A Weights & Biases account (you can sign up free here)
  • Python 3.7 or higher installed
  • Basic familiarity with Python and API usage
  • Understanding of your use case requirements (document analysis, code review, multilingual tasks, etc.)

Step 1: Installation & setup

1. Install required packages

To get started running inference with Qwen3-Coder, all you need to install is OpenAI and Weave. We’ll also show you how to streamline the review of multiple outputs with W&B Weave, making the process far more efficient.
The code to do this is:
pip install openai wandb weave
Run this in your terminal or Jupyter cell after entering this code.
When you execute the cell, you'll notice an asterisk ([*]) appear between the brackets [ ]. This indicates that the cell is running, and you'll need to wait until the asterisk turns into a number before proceeding.

2. Get your W&B API key

  1. Copy your API key
  2. Keep it handy for the next step

Step 2: Environment configuration

Set your W&B API key as an environment variable. Choose the method that fits your workflow:

Option 1: In a Jupyter Notebook

# Set environment variables in your notebook
%env WANDB_API_KEY=your-wandb-api-key-here

Option 2: In Terminal/Shell

export WANDB_API_KEY="your-wandb-api-key-here"

Option 3: In Python script

import os
# Set environment variables programmatically
os.environ["WANDB_API_KEY] = "your-wandb-api-key-here"

Step 3: Writing a sorting algorithm with Qwen3-Coder

Here’s a simple example of writing a sorting algorithm with Qwen3-Coder.
import os
from pathlib import Path
import openai
import weave

PROJECT = "wandb_inference"
weave.init(PROJECT)

MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground"
}
)

def extract_code(md: str) -> str:
parts = md.split("```")
if len(parts) == 1:
return md
for i in range(1, len(parts), 2):
block = parts[i]
if block.lower().startswith("python"):
return block.split("\n", 1)[1]
return block
return md

def run_sorting_demo():
out = Path("artifacts_sorting"); out.mkdir(exist_ok=True)
prompt = (
"Write a standalone Python module implementing merge sort with merge_sort(xs). "
"Include a docstring explaining O(n log n) time and O(n) space. Add a __main__ demo. Return only code."
)
resp = client.chat.completions.create(
model=MODEL,
messages=[
{"role":"system","content":"You are a senior Python engineer."},
{"role":"user","content":prompt}
],
temperature=0.2,
max_tokens=1600,
)
code = extract_code(resp.choices[0].message.content)
(out/"merge_sort.py").write_text(code, encoding="utf-8")
print("Wrote", (out/"merge_sort.py").resolve())

if __name__ == "__main__":
run_sorting_demo()
You'll find the inputs and outputs recorded to your Weave dashboard with the parameters automatically included:


Step 4: Using Qwen3 235B A22B-2507 to write code docs

Understanding inference parameters

You can fine-tune Qwen3-Coder’s behavior by applying different inference parameters and then compare the outputs in Weave.
import os
from pathlib import Path
import textwrap
import openai
import weave

PROJECT = "wandb_inference"
MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"

weave.init(PROJECT)

client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={"OpenAI-Project": "wandb_fc/quickstart_playground"},
)

def write_demo_code(root: Path) -> None:
src = root / "src"
tests = root / "tests"
src.mkdir(parents=True, exist_ok=True)
tests.mkdir(parents=True, exist_ok=True)

(src / "sorting.py").write_text(textwrap.dedent("""\
\"\"\"Sorting utilities.\"\"\"

from typing import List, TypeVar, Callable, Optional

T = TypeVar("T")

def merge_sort(items: List[T], key: Optional[Callable[[T], object]] = None) -> List[T]:
\"\"\"Stable merge sort.

Time: O(n log n)
Space: O(n) auxiliary

Args:
items: list of values
key: optional projection

Returns:
New sorted list.
\"\"\"
if key is None:
key = lambda x: x

def merge(left, right):
out = []
i = j = 0
while i < len(left) and j < len(right):
if key(left[i]) <= key(right[j]):
out.append(left[i]); i += 1
else:
out.append(right[j]); j += 1
out.extend(left[i:]); out.extend(right[j:])
return out

n = len(items)
if n <= 1:
return list(items)

mid = n // 2
left = merge_sort(items[:mid], key)
right = merge_sort(items[mid:], key)
return merge(left, right)
"""), encoding="utf-8")

(src / "search.py").write_text(textwrap.dedent("""\
\"\"\"Search utilities.\"\"\"

from typing import Sequence, TypeVar, Callable, Optional

T = TypeVar("T")

def binary_search(seq: Sequence[T], target: T, key: Optional[Callable[[T], object]] = None) -> int:
\"\"\"Return index of target in a sorted sequence or -1 if not found.\"\"\"
if key is None:
key = lambda x: x
lo, hi = 0, len(seq) - 1
while lo <= hi:
mid = (lo + hi) // 2
kmid = key(seq[mid])
if kmid == key(target):
return mid
if kmid < key(target):
lo = mid + 1
else:
hi = mid - 1
return -1
"""), encoding="utf-8")

(src / "buggy.py").write_text(textwrap.dedent("""\
\"\"\"A tiny bug for the docs debugging section.\"\"\"

def reverse_string(s: str) -> str:
\"\"\"Return the reverse of s.

Intentional bug: slice step is 1 instead of -1.
\"\"\"
return s[::1] # bug
"""), encoding="utf-8")

(src / "app.py").write_text(textwrap.dedent("""\
\"\"\"CLI-free demo runner.\"\"\"
from sorting import merge_sort
from search import binary_search
from buggy import reverse_string

def run_demo() -> dict:
data = [5, 2, 9, 1, 5, 6]
sorted_data = merge_sort(data)
idx_of_5 = binary_search(sorted_data, 5)
buggy = reverse_string("weave")
return {
"input": data,
"sorted": sorted_data,
"index_of_5": idx_of_5,
"reverse_bug": buggy,
}

if __name__ == "__main__":
print(run_demo())
"""), encoding="utf-8")

(tests / "test_sorting.py").write_text(textwrap.dedent("""\
from src.sorting import merge_sort

def test_merge_sort_basic():
assert merge_sort([3,1,2]) == [1,2,3]

def test_merge_sort_stability():
items = [("a",1),("b",1),("c",0)]
out = merge_sort(items, key=lambda x: x[1])
assert out == [("c",0),("a",1),("b",1)]
"""), encoding="utf-8")

def collect_context(root: Path) -> str:
files = [
root / "src" / "sorting.py",
root / "src" / "search.py",
root / "src" / "buggy.py",
root / "src" / "app.py",
root / "tests" / "test_sorting.py",
]
parts = []
for p in files:
try:
txt = p.read_text(encoding="utf-8")
except Exception:
txt = ""
header = f"\nFile: {p.as_posix()}\n"
snippet = txt.strip()
if len(snippet) > 1200:
snippet = snippet[:1200]
parts.append(header + snippet)
return "\n".join(parts)

def run_docs_demo():
out = Path("artifacts_docs")
out.mkdir(exist_ok=True)

write_demo_code(out)
code_context = collect_context(out)

project_context = textwrap.dedent("""\
Project name
Personal Tools Demo

Summary
This repo contains a small Python package with merge sort, binary search, and a deliberately buggy reverse function.
A simple runner shows usage. Tests cover sorting.
""").strip()

prompt = f"""
Write a README with plain headers only.
Sections: Title, Overview, Setup, How to Run, Code Tour, Complexity, Testing, Debugging Notes, Outputs, Notes.
Explain merge sort complexity as O(n log n) and note O(n) auxiliary space.
Avoid lists, bold, italics, double hyphens, and words like critical or essential.
Describe the included files and what they do.
Include a short note that reverse_string has a slice bug and how to fix it by changing s[::1] to s[::-1].
Context
{project_context}

Code
{code_context}
""".strip()

resp = client.chat.completions.create(
model=MODEL,
messages=[
{"role": "system", "content": "You write clear technical READMEs."},
{"role": "user", "content": prompt},
],
temperature=0.2,
max_tokens=1400,
)
text = resp.choices[0].message.content
(out / "README.md").write_text(text, encoding="utf-8")
print("Wrote", (out / "README.md").resolve())
print("Wrote code under", out.resolve())

if __name__ == "__main__":
run_docs_demo()
Parameter Guidelines:
  • Temperature: Use 0.1-0.3 for analytical tasks, 0.7-0.9 for creative work
  • Top_p: Combine with temperature; 0.9 works well for most applications
We also show how to stream responses for a more interactive experience, which is ideal for chatbots or applications with long outputs.


Running inference with Qwen3-Coder unique capabilities

I'll demonstrate a few use-cases for Qwen3-Coder.

Long context inference

Qwen3-Coder excels at running inference on extensive documents. Here's a practical example:
import io
import requests
import openai
import weave
from pypdf import PdfReader
import os


PROJECT = "wandb_inference"
weave.init(PROJECT)


PDF_URL = "https://docs.aws.amazon.com/pdfs/bedrock-agentcore/latest/devguide/bedrock-agentcore-dg.pdf"
QUESTION = "Summarize how AgentCore's memory architecture functions and when to use it."


client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=os.getenv("WANDB_API_KEY"),
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project
}
)


r = requests.get(PDF_URL, timeout=60)
r.raise_for_status()


reader = PdfReader(io.BytesIO(r.content))
pages = reader.pages[:100]
text = "\n\n".join(page.extract_text() or "" for page in pages)


doc_snippet = text


prompt = (
f"Using the provided AWS Bedrock AgentCore doc, answer: {QUESTION}\n\n"
f"Documentation:\n{doc_snippet}\n\n"
"Cite quotes where relevant."
)


resp = client.chat.completions.create(
model="Qwen/Qwen3-Coder-480B-A35B-Instruct",
messages=[
{"role": "system", "content": "You analyze the given text only. If info is missing, say so."},
{"role": "user", "content": prompt}
],
temperature=0.25,
max_tokens=1400,
)


print(resp.choices[0].message.content)

Which outputs to Weave:


Qwen3-Coder inference with W&B Weave

Once initialized, Weave automatically logs all inference API calls. You’ll have access to:
  • Request details: model, parameters, token counts
  • Response data: outputs, runtime, status
  • Usage metrics: tokens consumed, costs, rate limits
  • Performance: latency and throughput patterns
You can access your logs in the W&B dashboard, filter by run, and analyze patterns. Adding custom annotations helps organize logs by use case or experiment.

Custom Weave annotations

Add custom metadata and organize your API calls:
import openai
import wandb
import weave
from weave import Content

# Init weave
PROJECT = "wandb_inference"
weave.init(PROJECT)

# Configure OpenAI client with W&B Inference
client = openai.OpenAI(
base_url="https://api.inference.wandb.ai/v1",
api_key=wandb.api.api_key,
project=PROJECT,
default_headers={
"OpenAI-Project": "wandb_fc/quickstart_playground"
}
)

MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"

def write_html_file(html_text: str, out_dir="site_output", filename="index.html") -> str:
out = Path(out_dir)
out.mkdir(exist_ok=True)
path = out / filename
path.write_text(html_text, encoding="utf-8")
return str(path.resolve())

@weave.op
def generate_html(prompt: str) -> Annotated[str, Content[Literal["html"]]]:
request = f"""
You are a front-end web developer.
Generate a single self-contained HTML file named index.html.
Put all CSS in a <style> tag and all JS in a <script> tag.
Use semantic HTML, a sticky header, hero section, projects grid, and a contact form with client-side validation.
Plain CSS and vanilla JS only. No external CDNs or assets.
Demonstrate: "{prompt}".
Return only the HTML source, no explanations, no code fences, no extra text.
"""
resp = client.chat.completions.create(
model=MODEL,
messages=[{"role": "system", "content": request}],
temperature=0.3,
max_tokens=3500,
)
html_text = resp.choices[0].message.content.strip()
write_html_file(html_text)
return html_text.encode("utf-8")

if __name__ == "__main__":
site_request = "A playful portfolio page styled like Windows 95 with a projects grid and contact form"
result = generate_html(site_request)
print("HTML logged to Weave and written to site_output/index.html")
Which would appear as:


Best Practices

🔐 Security & Configuration

  • Keep API keys in environment variables rather than embedding them in code.
  • Choose clear and descriptive project names (for example, team/project).
  • Limit API key permissions to only what is required.

✍️ Prompt Engineering

  • Make use of Qwen3-Coder’s extended context support.
  • Define the output format and style you want.
  • Provide thorough system messages to guide context and tone.
  • Tune the temperature setting: lower for analysis, higher for creativity.

⚡ Performance Optimization

  • Turn on streaming for lengthy outputs.
  • Group similar requests together to reduce time and cost.
  • Track token usage to maintain efficiency.
  • Reuse results by caching frequent queries.

📊 Monitoring & Debugging

  • Rely on Weave’s automatic logging for every production call.
  • Attach metadata annotations to keep experiments organized.
  • Check failed requests often to spot problems early.
  • Monitor latency and fine-tune configurations to maintain stable performance.

Next steps

Now that you’ve mastered the basics of Qwen3-Coder:
🔗 Explore advanced features → Review W&B Inference docs and experiment with Weave’s evaluation tools.
📊 Optimize workflows → Create monitoring dashboards, conduct A/B testing for prompts, and develop metrics tailored to your domain.
🚀 Scale deployments → Set up reliable production pipelines, reduce costs through optimization, and connect with other W&B tools.
📚 Deepen your knowledge → Review the Qwen3-Coder Model Card, look through community examples, and keep up with the latest updates.
Iterate on AI agents and models faster. Try Weights & Biases today.