Skip to main content

Integrating W&B Inference with Claude Code: A step-by-step guide

Save 70-80% on AI coding costs by integrating Claude Code with W&B Inference. Complete guide using official APIs - no complex proxy setup required.
Created on July 28|Last edited on August 4
Claude Code is Anthropic's powerful agentic command-line tool, but using it with Anthropic's API can be expensive for heavy development work. This guide shows you how to configure Claude Code to work with W&B Inference models like Qwen, providing similar capabilities at significantly reduced costs.

What You'll Learn

  • How to configure Claude Code to use W&B Inference directly
  • Setting up proper authentication and project tracking
  • Cost comparison and model selection strategies
  • Troubleshooting common issues
  • Best practices for development workflows

Prerequisites

Before starting, ensure you have:

Understanding the setup

Unlike the previous proxy-based approach, we'll configure Claude Code to connect directly to W&B Inference using their OpenAI-compatible API. This is simpler, more reliable, and officially supported.
The setup works as follows: 1. Claude Code expects an OpenAI-compatible API 2. W&B Inference provides an OpenAI-compatible endpoint 3. We configure Claude Code to point to W&B's endpoint with proper authentication 4. Weights & Biases handles model serving and usage tracking automatically

Step 1: Get your W&B credentials

First, gather your W&B credentials:
  1. API Key: Get it from https://wandb.ai/authorize
  2. Team/Project: Use the format team/project-name (e.g., mycompany/claude-dev)

Step 2: Test W&B Inference connection

Before configuring Claude Code, let's verify W&B Inference works with a simple test:

Create and run a test script

Create a file named test_wb_inference.py and past this into it. Save it to your desktop.
import openai

client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="your-wandb-api-key-here"", # Replace with your actual key
project="your-team/your-project", # Replace with your team/project
)


try:
response = client.chat.completions.create(
model="Qwen/Qwen3-Coder-480B-A35B-Instruct",
messages=[
{"role": "system", "content": "You are a helpful coding assistant."},
{"role": "user", "content": "Write a simple Python function to calculate factorial."}
],
max_tokens=1000,
temperature=0.7
)
print("✅ W&B Inference connection successful!")
print("Response:", response.choices[0].message.content)
print("Model used:", response.model)
print("Tokens used:", response.usage.total_tokens if response.usage else "N/A")


except Exception as e:
print("❌ Connection failed:", str(e))
Run the test:
pip install openai
python test_wb_inference.py
If this works, you're ready to configure Claude Code.

Step 3: Configure Claude Code Environment

Claude Code uses environment variables for configuration. Create a script to set them up:

Create configuration script

# W&B Inference configuration
export OPENAI_API_KEY="your-wandb-api-key-here""
export OPENAI_BASE_URL="https://api.inference.wandb.ai/v1"
export OPENAI_PROJECT="your-team/your-project"

# Optional: Disable non-essential traffic for better performance
export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1

echo "✅ Claude Code configured for W&B Inference"
echo "Base URL: $OPENAI_BASE_URL"
echo "Project: $OPENAI_PROJECT"
echo ""
echo "You can now run Claude Code commands like:"
echo " claude 'Write a Python function to sort a list'"
echo " claude --help"
Make it executable:
chmod +x setup_claude_wb.sh

Step 4: Test Claude Code Integration

Source your configuration and test Claude Code:
# Load the configuration
source setup_claude_wb.sh

# Test with a simple command
claude "Write a simple 'Hello, World!' program in Python"

# Test with a more complex coding task
claude "Create a Python class for a basic calculator with add, subtract, multiply, and divide methods"

Step 5: Model Selection and Configuration

W&B Inference offers several Qwen models optimized for different use cases:

Available Models (as of January 2025)

ModelBest ForContext LengthSpeed
Qwen/Qwen3-Coder-480B-A35B-InstructCode generation, complex programming tasks32KMedium
Qwen/Qwen3-235B-A22B-Instruct-2507General coding assistance, documentation32KFast
Qwen/Qwen2.5-Coder-32B-InstructLightweight coding tasks, quick responses32KVery Fast


Configure Default Model

Claude Code will use the model specified in your requests, but you can set preferences:
# For heavy coding work - use the most capable model
export CLAUDE_DEFAULT_MODEL="Qwen/Qwen3-Coder-480B-A35B-Instruct"

# For quick tasks - use the faster model
export CLAUDE_DEFAULT_MODEL="Qwen/Qwen2.5-Coder-32B-Instruct"

Step 6: Advanced Configuration

Create a Persistent Configuration

Instead of sourcing a script each time, create a permanent configuration:
# Add to your ~/.bashrc or ~/.zshrc
echo 'export OPENAI_API_KEY="your-wandb-api-key-here"' >> ~/.bashrc
echo 'export OPENAI_BASE_URL="https://api.inference.wandb.ai/v1"' >> ~/.bashrc
echo 'export OPENAI_PROJECT="your-team/your-project"' >> ~/.bashrc
echo 'export CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC=1' >> ~/.bashrc

# Reload your shell
source ~/.bashrc

Project-Specific Configuration

For different projects, you can use different W&B projects:
# project-a-config.sh
export OPENAI_PROJECT="mycompany/project-a"
export CLAUDE_PROJECT_CONTEXT="Working on project A - focus on React/TypeScript"

# project-b-config.sh
export OPENAI_PROJECT="mycompany/project-b"
export CLAUDE_PROJECT_CONTEXT="Working on project B - focus on Python/Django"

Cost Comparison

Here's an updated cost comparison based on current pricing:
ServiceInput (per 1M tokens)Output (per 1M tokens)Total Cost Example*
Anthropic Claude Sonnet 4$3.00$15.00$18.00
W&B Qwen3-Coder-480B~$1.00~$3.00$4.00
Savings67%80%78%

*Example based on 1M input tokens + 1M output tokens

Integration with Development Tools

VS Code Integration

# In VS Code terminal, source your configuration
source setup_claude_wb.sh

# Now use Claude Code within VS Code
claude "Add error handling to this function" --file src/utils.py
claude "Write unit tests for this class" --file src/models.py

Git Workflow Integration

# Generate commit messages
git diff --cached | claude "Write a concise commit message for these changes"

# Code review assistance
git diff main..feature-branch | claude "Review this code for potential issues and suggest improvements"

# Documentation generation
claude "Generate README documentation for this project" --file package.json

IDE Integration

Many IDEs can be configured to use custom OpenAI endpoints:
// VS Code settings.json for extensions like "Claude" or "AI Coder"
{
"aiCoder.openai.baseUrl": "https://api.inference.wandb.ai/v1",
"aiCoder.openai.apiKey": "your-wandb-api-key",
"aiCoder.openai.model": "Qwen/Qwen3-Coder-480B-A35B-Instruct"
}

Monitoring and Usage Tracking

W&B Dashboard

Monitor your usage through the W&B interface:
  1. Go to your W&B project dashboard
  2. Navigate to the "Inference" section
  3. View usage statistics, costs, and model performance
  4. Set up alerts for usage thresholds

Usage Tracking Script

# usage_tracker.py - Monitor your Claude Code usage
import openai
import weave

# Initialize Weave for automatic logging
weave.init("your-team/your-project")

client = openai.OpenAI(
base_url='https://api.inference.wandb.ai/v1',
api_key="your-wandb-api-key",
project="your-team/your-project",
)

# Your Claude Code requests will now be automatically logged to W&B

Troubleshooting Common Issues

Issue 1: Authentication Errors

# Verify your API key
curl -H "Authorization: Bearer your-api-key" \
-H "OpenAI-Project: your-team/your-project" \
https://api.inference.wandb.ai/v1/models

# Should return a list of available models

Issue 2: Model Not Found

# List available models
curl -H "Authorization: Bearer your-api-key" \
-H "OpenAI-Project: your-team/your-project" \
https://api.inference.wandb.ai/v1/models | jq '.data[].id'

Issue 3: Claude Code Not Recognizing Configuration

# Verify environment variables are set
echo "API Key: ${OPENAI_API_KEY:0:10}..."
echo "Base URL: $OPENAI_BASE_URL"
echo "Project: $OPENAI_PROJECT"

# Test with verbose output
claude --debug "Simple test query"

Issue 4: Rate Limiting

# Add delays between requests if hitting rate limits
export CLAUDE_REQUEST_DELAY=1 # 1 second delay between requests

Best Practices

1. Security

  • Store API keys in environment variables, not in code
  • Use project-specific API keys when possible
  • Regularly rotate API keys
  • Never commit API keys to version control

2. Cost Optimization

  • Use lighter models (Qwen2.5-Coder-32B) for simple tasks
  • Monitor usage through W&B dashboard
  • Set up billing alerts
  • Cache common responses when appropriate

3. Performance

  • Use appropriate models for different task types
  • Configure reasonable token limits
  • Enable request caching when beneficial
  • Monitor response times and adjust model selection

4. Development Workflow

  • Create project-specific configurations
  • Use meaningful project names for tracking
  • Document your model choices and reasoning
  • Share configurations with team members

Example Workflows

Daily Development Routine

#!/bin/bash
# daily_dev_setup.sh

# Load W&B configuration
source setup_claude_wb.sh

# Start development session
echo "🚀 Starting development session with W&B Inference"

# Quick status check
claude "What's the current date and time?" > /dev/null 2>&1
if [ $? -eq 0 ]; then
echo "✅ Claude Code + W&B Inference ready"
else
echo "❌ Configuration issue - check your setup"
exit 1
fi

# Optional: Start your IDE
# code .

Code Review Workflow

#!/bin/bash
# code_review.sh

# Review staged changes
if git diff --cached --quiet; then
echo "No staged changes to review"
exit 1
fi

echo "🔍 Reviewing staged changes..."
git diff --cached | claude "Review this code for:
1. Potential bugs or issues
2. Code quality and best practices
3. Security concerns
4. Performance implications
5. Suggestions for improvement

Be constructive and specific in your feedback."

Migration from Anthropic API

If you're currently using Claude Code with Anthropic's API:

1. Backup Current Configuration

# Save current settings
echo "Current ANTHROPIC_API_KEY: ${ANTHROPIC_API_KEY:0:10}..."
echo "Current ANTHROPIC_BASE_URL: $ANTHROPIC_BASE_URL"

2. Gradual Migration

# Test W&B setup alongside existing configuration
export OPENAI_API_KEY="your-wandb-key"
export OPENAI_BASE_URL="https://api.inference.wandb.ai/v1"

# Test a simple query
claude "Hello, this is a test of W&B Inference"

# If successful, update your permanent configuration

3. Compare Results

Run the same coding tasks with both APIs to compare:
  • Response quality
  • Response speed
  • Cost per task
  • Overall satisfaction

Conclusion

This updated approach provides a much more reliable integration between Claude Code and W&B Inference by:
  • Using the official W&B Inference OpenAI-compatible API
  • Eliminating the complexity and potential issues of proxy servers
  • Providing better error handling and debugging capabilities
  • Offering more straightforward monitoring and usage tracking
The direct integration approach is:
  • Simpler: No proxy server to manage
  • More reliable: Fewer moving parts and potential failure points
  • Better supported: Uses official APIs from both services
  • Easier to debug: Direct connection makes troubleshooting straightforward
With this setup, you can achieve significant cost savings (70-80%) while maintaining excellent coding assistance capabilities. The W&B Inference models provide strong performance for coding tasks at a fraction of the cost of Anthropic's API.

Next Steps

  1. Test the setup: Follow the steps above to verify everything works
  2. Monitor usage: Keep track of costs and performance through W&B dashboard
  3. Optimize model selection: Experiment with different models for different tasks
  4. Share with team: Roll out this cost-effective solution to other developers
  5. Stay updated: Monitor W&B Inference for new models and features

Useful Resources

Happy coding with your cost-effective AI assistant! 🎉


Arvit Varfaj
Arvit Varfaj •  
is this still working?
Reply
Iterate on AI agents and models faster. Try Weights & Biases today.