Tutorial: Run inference with Qwen3-Coder using W&B Inference
Getting set up and running Qwen3-Coder, Qwen's large language model, in Python using W&B Inference.
Created on September 15|Last edited on September 22
Comment
Running inference with Qwen3-Coder (A.K.A Qwen3-Coder 480B A35B Instruct) through W&B Inference powered by CoreWeave is quick to set up yet powerful enough for sophisticated use cases. In this tutorial, you’ll see how to set up the model, perform inference, and make use of advanced features, while also tracking and troubleshooting your experiments with W&B Weave.
Whether you’re working with long documents, building multilingual systems, or tackling complex reasoning tasks, this guide gives you the tools to use Qwen3 Coder effectively in your workflow.
Table of contents
What is Qwen3-CoderW&B WeaveTutorial: Running inference with Qwen3-Coder using W&B InferencePrerequisitesStep 1: Installation & setupStep 2: Environment configurationStep 3: Writing a sorting algorithm with Qwen3-CoderStep 4: Using Qwen3 235B A22B-2507 to write code docs Running inference with Qwen3-Coder unique capabilitiesQwen3-Coder inference with W&B WeaveBest Practices🔐 Security & Configuration✍️ Prompt Engineering⚡ Performance Optimization📊 Monitoring & DebuggingNext steps
What is Qwen3-Coder
Qwen3-Coder is a large-scale code-specialized language model from Qwen, designed to deliver top-tier performance across software engineering, reasoning, and multilingual programming while staying tightly aligned with developer workflows.
🖥️ On coding and software engineering tasks, it establishes itself as one of the strongest open models. On SWE-Bench Verified with OpenHands it scores 67.0% (100 turns) and 69.6% (500 turns), competitive with Claude 4 Sonnet and ahead of GPT-4.1. On Aider-Polyglot it reaches 61.8%, surpassing Claude and GPT-4.1. In live settings, it posts 26.3% on SWE-Bench Live, ahead of Kimi and DeepSeek, and it records 54.7% on SWE-Bench Multilingual, outperforming most competitors.
➗ In reasoning-heavy software benchmarks, it shows balanced strength. It achieves 25.8% on Multi-SWE-bench mini and 27.0% on Multi-SWE-bench flash, narrowly edging out Claude Sonnet and far above DeepSeek. On Spider2, which tests complex database reasoning, it posts 31.1%, tied with Claude and nearly double GPT-4.1.
🌍 For multilingual programming, Qwen3-Coder demonstrates robust ability. Its 54.7% score on SWE-Bench Multilingual places it above Claude and well ahead of GPT-4.1, while its performance on Aider-Polyglot confirms its adaptability across diverse languages and coding environments.
📊 Overall, Qwen3-Coder delivers state-of-the-art open-model performance in software engineering, combining long-context reasoning, strong tool-use, and competitive multilingual coding ability. It rivals or surpasses closed models like Claude and GPT-4.1 in many coding benchmarks, making it one of the best developer-focused models available today.
For detailed technical specifications and performance benchmarks, visit the Qwen3-Coder model documentation.
W&B Weave
W&B Weave goes beyond simple logging; it organizes and visualizes your model runs so you can debug, compare, and refine more effectively.
Getting started is easy: just import the library and initialize it with your project name.
One notable feature is the @weave.op decorator. In standard Python, functions execute without capturing their inputs or outputs. By using @weave.op, each function call is logged automatically, eliminating the need to create your own logging tools or clutter notebooks with print statements.
All logs appear in the Weave dashboard, where you can:
- View interactive visualizations, timelines, and traces of function calls
- Drill into details and compare different runs
- Trace outputs back to inputs for reproducibility
This makes Weave a powerful tool for model development. Instead of handling scattered log files, you get a unified visual record of your experiments, which simplifies debugging, ensures reproducible results, and helps fine-tune models like Qwen3-Coder with fewer hurdles.
Tutorial: Running inference with Qwen3-Coder using W&B Inference
We’ll be using the Qwen/Qwen3-Coder-480B-A35B-Instruct model. The examples here assume you’re running inside a Jupyter Notebook, though the code works in any Python environment.
If you’re new to Jupyter, setup takes about five minutes.
If you're not familiar with Jupyter Notebooks, you can get set up in about 5 minutes. I walk you through it in this tutorial.
💡
Prerequisites
Before starting, ensure you have:
- Python 3.7 or higher installed
- Basic familiarity with Python and API usage
- Understanding of your use case requirements (document analysis, code review, multilingual tasks, etc.)
Step 1: Installation & setup
1. Install required packages
To get started running inference with Qwen3-Coder, all you need to install is OpenAI and Weave. We’ll also show you how to streamline the review of multiple outputs with W&B Weave, making the process far more efficient.
The code to do this is:
pip install openai wandb weave
Run this in your terminal or Jupyter cell after entering this code.
When you execute the cell, you'll notice an asterisk ([*]) appear between the brackets [ ]. This indicates that the cell is running, and you'll need to wait until the asterisk turns into a number before proceeding.
2. Get your W&B API key
- Copy your API key
- Keep it handy for the next step
Step 2: Environment configuration
Set your W&B API key as an environment variable. Choose the method that fits your workflow:
Option 1: In a Jupyter Notebook
# Set environment variables in your notebook%env WANDB_API_KEY=your-wandb-api-key-here
Option 2: In Terminal/Shell
export WANDB_API_KEY="your-wandb-api-key-here"
Option 3: In Python script
import os# Set environment variables programmaticallyos.environ["WANDB_API_KEY] = "your-wandb-api-key-here"
Step 3: Writing a sorting algorithm with Qwen3-Coder
Here’s a simple example of writing a sorting algorithm with Qwen3-Coder.
import osfrom pathlib import Pathimport openaiimport weavePROJECT = "wandb_inference"weave.init(PROJECT)MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"client = openai.OpenAI(base_url="https://api.inference.wandb.ai/v1",api_key=os.getenv("WANDB_API_KEY"),project=PROJECT,default_headers={"OpenAI-Project": "wandb_fc/quickstart_playground"})def extract_code(md: str) -> str:parts = md.split("```")if len(parts) == 1:return mdfor i in range(1, len(parts), 2):block = parts[i]if block.lower().startswith("python"):return block.split("\n", 1)[1]return blockreturn mddef run_sorting_demo():out = Path("artifacts_sorting"); out.mkdir(exist_ok=True)prompt = ("Write a standalone Python module implementing merge sort with merge_sort(xs). ""Include a docstring explaining O(n log n) time and O(n) space. Add a __main__ demo. Return only code.")resp = client.chat.completions.create(model=MODEL,messages=[{"role":"system","content":"You are a senior Python engineer."},{"role":"user","content":prompt}],temperature=0.2,max_tokens=1600,)code = extract_code(resp.choices[0].message.content)(out/"merge_sort.py").write_text(code, encoding="utf-8")print("Wrote", (out/"merge_sort.py").resolve())if __name__ == "__main__":run_sorting_demo()
You'll find the inputs and outputs recorded to your Weave dashboard with the parameters automatically included:

Step 4: Using Qwen3 235B A22B-2507 to write code docs
Understanding inference parameters
You can fine-tune Qwen3-Coder’s behavior by applying different inference parameters and then compare the outputs in Weave.
import osfrom pathlib import Pathimport textwrapimport openaiimport weavePROJECT = "wandb_inference"MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"weave.init(PROJECT)client = openai.OpenAI(base_url="https://api.inference.wandb.ai/v1",api_key=os.getenv("WANDB_API_KEY"),project=PROJECT,default_headers={"OpenAI-Project": "wandb_fc/quickstart_playground"},)def write_demo_code(root: Path) -> None:src = root / "src"tests = root / "tests"src.mkdir(parents=True, exist_ok=True)tests.mkdir(parents=True, exist_ok=True)(src / "sorting.py").write_text(textwrap.dedent("""\\"\"\"Sorting utilities.\"\"\"from typing import List, TypeVar, Callable, OptionalT = TypeVar("T")def merge_sort(items: List[T], key: Optional[Callable[[T], object]] = None) -> List[T]:\"\"\"Stable merge sort.Time: O(n log n)Space: O(n) auxiliaryArgs:items: list of valueskey: optional projectionReturns:New sorted list.\"\"\"if key is None:key = lambda x: xdef merge(left, right):out = []i = j = 0while i < len(left) and j < len(right):if key(left[i]) <= key(right[j]):out.append(left[i]); i += 1else:out.append(right[j]); j += 1out.extend(left[i:]); out.extend(right[j:])return outn = len(items)if n <= 1:return list(items)mid = n // 2left = merge_sort(items[:mid], key)right = merge_sort(items[mid:], key)return merge(left, right)"""), encoding="utf-8")(src / "search.py").write_text(textwrap.dedent("""\\"\"\"Search utilities.\"\"\"from typing import Sequence, TypeVar, Callable, OptionalT = TypeVar("T")def binary_search(seq: Sequence[T], target: T, key: Optional[Callable[[T], object]] = None) -> int:\"\"\"Return index of target in a sorted sequence or -1 if not found.\"\"\"if key is None:key = lambda x: xlo, hi = 0, len(seq) - 1while lo <= hi:mid = (lo + hi) // 2kmid = key(seq[mid])if kmid == key(target):return midif kmid < key(target):lo = mid + 1else:hi = mid - 1return -1"""), encoding="utf-8")(src / "buggy.py").write_text(textwrap.dedent("""\\"\"\"A tiny bug for the docs debugging section.\"\"\"def reverse_string(s: str) -> str:\"\"\"Return the reverse of s.Intentional bug: slice step is 1 instead of -1.\"\"\"return s[::1] # bug"""), encoding="utf-8")(src / "app.py").write_text(textwrap.dedent("""\\"\"\"CLI-free demo runner.\"\"\"from sorting import merge_sortfrom search import binary_searchfrom buggy import reverse_stringdef run_demo() -> dict:data = [5, 2, 9, 1, 5, 6]sorted_data = merge_sort(data)idx_of_5 = binary_search(sorted_data, 5)buggy = reverse_string("weave")return {"input": data,"sorted": sorted_data,"index_of_5": idx_of_5,"reverse_bug": buggy,}if __name__ == "__main__":print(run_demo())"""), encoding="utf-8")(tests / "test_sorting.py").write_text(textwrap.dedent("""\from src.sorting import merge_sortdef test_merge_sort_basic():assert merge_sort([3,1,2]) == [1,2,3]def test_merge_sort_stability():items = [("a",1),("b",1),("c",0)]out = merge_sort(items, key=lambda x: x[1])assert out == [("c",0),("a",1),("b",1)]"""), encoding="utf-8")def collect_context(root: Path) -> str:files = [root / "src" / "sorting.py",root / "src" / "search.py",root / "src" / "buggy.py",root / "src" / "app.py",root / "tests" / "test_sorting.py",]parts = []for p in files:try:txt = p.read_text(encoding="utf-8")except Exception:txt = ""header = f"\nFile: {p.as_posix()}\n"snippet = txt.strip()if len(snippet) > 1200:snippet = snippet[:1200]parts.append(header + snippet)return "\n".join(parts)def run_docs_demo():out = Path("artifacts_docs")out.mkdir(exist_ok=True)write_demo_code(out)code_context = collect_context(out)project_context = textwrap.dedent("""\Project namePersonal Tools DemoSummaryThis repo contains a small Python package with merge sort, binary search, and a deliberately buggy reverse function.A simple runner shows usage. Tests cover sorting.""").strip()prompt = f"""Write a README with plain headers only.Sections: Title, Overview, Setup, How to Run, Code Tour, Complexity, Testing, Debugging Notes, Outputs, Notes.Explain merge sort complexity as O(n log n) and note O(n) auxiliary space.Avoid lists, bold, italics, double hyphens, and words like critical or essential.Describe the included files and what they do.Include a short note that reverse_string has a slice bug and how to fix it by changing s[::1] to s[::-1].Context{project_context}Code{code_context}""".strip()resp = client.chat.completions.create(model=MODEL,messages=[{"role": "system", "content": "You write clear technical READMEs."},{"role": "user", "content": prompt},],temperature=0.2,max_tokens=1400,)text = resp.choices[0].message.content(out / "README.md").write_text(text, encoding="utf-8")print("Wrote", (out / "README.md").resolve())print("Wrote code under", out.resolve())if __name__ == "__main__":run_docs_demo()
Parameter Guidelines:
- Temperature: Use 0.1-0.3 for analytical tasks, 0.7-0.9 for creative work
- Top_p: Combine with temperature; 0.9 works well for most applications
We also show how to stream responses for a more interactive experience, which is ideal for chatbots or applications with long outputs.

Running inference with Qwen3-Coder unique capabilities
I'll demonstrate a few use-cases for Qwen3-Coder.
Long context inference
Qwen3-Coder excels at running inference on extensive documents. Here's a practical example:
import ioimport requestsimport openaiimport weavefrom pypdf import PdfReaderimport osPROJECT = "wandb_inference"weave.init(PROJECT)PDF_URL = "https://docs.aws.amazon.com/pdfs/bedrock-agentcore/latest/devguide/bedrock-agentcore-dg.pdf"QUESTION = "Summarize how AgentCore's memory architecture functions and when to use it."client = openai.OpenAI(base_url="https://api.inference.wandb.ai/v1",api_key=os.getenv("WANDB_API_KEY"),project=PROJECT,default_headers={"OpenAI-Project": "wandb_fc/quickstart_playground" # replace with your actual team/project})r = requests.get(PDF_URL, timeout=60)r.raise_for_status()reader = PdfReader(io.BytesIO(r.content))pages = reader.pages[:100]text = "\n\n".join(page.extract_text() or "" for page in pages)doc_snippet = textprompt = (f"Using the provided AWS Bedrock AgentCore doc, answer: {QUESTION}\n\n"f"Documentation:\n{doc_snippet}\n\n""Cite quotes where relevant.")resp = client.chat.completions.create(model="Qwen/Qwen3-Coder-480B-A35B-Instruct",messages=[{"role": "system", "content": "You analyze the given text only. If info is missing, say so."},{"role": "user", "content": prompt}],temperature=0.25,max_tokens=1400,)print(resp.choices[0].message.content)
Which outputs to Weave:

Qwen3-Coder inference with W&B Weave
Once initialized, Weave automatically logs all inference API calls. You’ll have access to:
- Request details: model, parameters, token counts
- Response data: outputs, runtime, status
- Usage metrics: tokens consumed, costs, rate limits
- Performance: latency and throughput patterns
You can access your logs in the W&B dashboard, filter by run, and analyze patterns. Adding custom annotations helps organize logs by use case or experiment.
Custom Weave annotations
Add custom metadata and organize your API calls:
import openaiimport wandbimport weavefrom weave import Content# Init weavePROJECT = "wandb_inference"weave.init(PROJECT)# Configure OpenAI client with W&B Inferenceclient = openai.OpenAI(base_url="https://api.inference.wandb.ai/v1",api_key=wandb.api.api_key,project=PROJECT,default_headers={"OpenAI-Project": "wandb_fc/quickstart_playground"})MODEL = "Qwen/Qwen3-Coder-480B-A35B-Instruct"def write_html_file(html_text: str, out_dir="site_output", filename="index.html") -> str:out = Path(out_dir)out.mkdir(exist_ok=True)path = out / filenamepath.write_text(html_text, encoding="utf-8")return str(path.resolve())@weave.opdef generate_html(prompt: str) -> Annotated[str, Content[Literal["html"]]]:request = f"""You are a front-end web developer.Generate a single self-contained HTML file named index.html.Put all CSS in a <style> tag and all JS in a <script> tag.Use semantic HTML, a sticky header, hero section, projects grid, and a contact form with client-side validation.Plain CSS and vanilla JS only. No external CDNs or assets.Demonstrate: "{prompt}".Return only the HTML source, no explanations, no code fences, no extra text."""resp = client.chat.completions.create(model=MODEL,messages=[{"role": "system", "content": request}],temperature=0.3,max_tokens=3500,)html_text = resp.choices[0].message.content.strip()write_html_file(html_text)return html_text.encode("utf-8")if __name__ == "__main__":site_request = "A playful portfolio page styled like Windows 95 with a projects grid and contact form"result = generate_html(site_request)print("HTML logged to Weave and written to site_output/index.html")
Which would appear as:

Best Practices
🔐 Security & Configuration
- Keep API keys in environment variables rather than embedding them in code.
- Choose clear and descriptive project names (for example, team/project).
- Limit API key permissions to only what is required.
✍️ Prompt Engineering
- Make use of Qwen3-Coder’s extended context support.
- Define the output format and style you want.
- Provide thorough system messages to guide context and tone.
- Tune the temperature setting: lower for analysis, higher for creativity.
⚡ Performance Optimization
- Turn on streaming for lengthy outputs.
- Group similar requests together to reduce time and cost.
- Track token usage to maintain efficiency.
- Reuse results by caching frequent queries.
📊 Monitoring & Debugging
- Rely on Weave’s automatic logging for every production call.
- Attach metadata annotations to keep experiments organized.
- Check failed requests often to spot problems early.
- Monitor latency and fine-tune configurations to maintain stable performance.
Next steps
Now that you’ve mastered the basics of Qwen3-Coder:
🔗 Explore advanced features → Review W&B Inference docs and experiment with Weave’s evaluation tools.
📊 Optimize workflows → Create monitoring dashboards, conduct A/B testing for prompts, and develop metrics tailored to your domain.
🚀 Scale deployments → Set up reliable production pipelines, reduce costs through optimization, and connect with other W&B tools.
📚 Deepen your knowledge → Review the Qwen3-Coder Model Card, look through community examples, and keep up with the latest updates.
Add a comment
Iterate on AI agents and models faster. Try Weights & Biases today.