Introducing W&B Inference powered by CoreWeave
Find the right open-source model for your unique use case inside Weights & Biases
Created on June 17|Last edited on June 24
Comment
As you iterate on your AI agents and applications, you often want to understand which LLM is the most effective for your specific task. Experimenting and evaluating LLMs involves multiple operational complexities. Teams typically need to sign up for multiple model-hosting providers or deploy models themselves, manage separate accounts, handle various API keys, and instrument their application code for tracing and observability. This approach introduces unnecessary complexity and forces developers to juggle disconnected platforms for hosting and observability.
W&B Inference: Easily access and explore open-source models
W&B Inference powered by CoreWeave, released in preview today, addresses these challenges by providing instant, unified access to powerful open-source foundation models directly in Weights & Biases without the additional overhead of managing multiple model providers or self-deploying models. Simply log in to your Weights & Biases account, view the model catalog, select one or more models, and run inference in the intuitive W&B Weave Playground or via an OpenAI-compatible API.
W&B Inference includes integrated observability tools provided by W&B Weave to evaluate, monitor, and iterate on AI applications with confidence. Weave offers comprehensive tools, including trace trees, evaluations, playground, human feedback, scorers, guardrails, monitors, and more for building AI agents and applications. We’re launching W&B Inference with some of the most popular open-source models from leading providers, including DeepSeek, Meta, and Microsoft, and will continue to expand our model selection.
Hands-on with W&B Inference
Consider a scenario where a software engineering team wants to optimize their AI-powered code annotation application. Currently, the team uses a large proprietary LLM to generate detailed code annotations but rapidly growing usage is driving annotation costs higher than budgeted. To reduce expenses without sacrificing annotation quality, the team decides to evaluate smaller, more cost-effective open-source models as alternatives.
Using W&B Inference, the team can quickly pick an alternative model. First, they navigate to the W&B Inference hosted models page and review supported open-source models, focusing on recently released, cost-effective options suitable for annotation tasks.

Exploring W&B Inference hosted models for some of the most popular and capable open-source LLMs available
Next, the team runs their existing annotation system prompt along with a representative sample user prompt directly in the W&B Weave Playground without the need for any configuration. With immediate side-by-side comparisons, they quickly identify which smaller open-source models perform comparably to the proprietary model.

Experimenting with multiple models side-by-side in the W&B Weave Playground to get a quick sense of the best model for a particular prompt
After this initial assessment, the team uses the integrated Weave capabilities to conduct a more comprehensive evaluation with historical prompts and compares the outputs produced by both the proprietary model and the selected open-source alternatives. This streamlined evaluation confirms which model best balances quality and cost.
Once a model is selected, the team visits the Model Overview page directly from the Hosted Models interface, grabs a ready-to-use Python code sample, and quickly integrates the selected open-source model into their application. Weave captures traces in production for ongoing monitoring and continuous improvement. This entire process—evaluation through implementation—is accomplished seamlessly within W&B Inference, saving significant engineering time, simplifying observability, and substantially reducing annotation costs.
import openaiclient = openai.OpenAI(# The custom base URL points to W&B Inferencebase_url='https://api.inference.wandb.ai/v1',# Get your API key from https://qa.wandb.ai/authorize# Consider setting it in the environment as OPENAI_API_KEY instead for safetyapi_key="<your-apikey>",# Team and project are required for usage trackingdefault_headers={"OpenAI-Project": "<team>/<project>"},)response = client.chat.completions.create(model="meta-llama/Llama-3.1-8B-Instruct",messages=[{"role": "system", "content": "<system prompt goes here>"},{"role": "user", "content": "<user code goes here>"}],)print(response.choices[0].message.content)
Optimize applications, evaluate effectively, and iterate quickly
W&B Inference includes a free tier with each Weights & Biases plan, enabling immediate exploration without additional upfront costs. Organizations with Enterprise plans and Pro plans can manage budgets with usage-based pricing per million input and output tokens. Organization-wide detailed token consumption usage reporting by model type is also included in the billing account.
W&B Inference empowers you to optimize AI applications by easily exploring, evaluating, and using some of the latest open-source models that deliver high accuracy and offer out-of-the-box integration with W&B Weave LLM Observability tools. Additionally, explore new and emerging models without the delays or complexities of signing up for additional accounts, juggling multiple API keys, or incurring extra costs.
Starting today, June 17th, W&B Inference is available in public preview for all W&B Weave multi-tenant SaaS customers directly through the Weights & Biases console, Weave Playground, and Weave SDK. To learn more and get started, see the W&B Inference documentation and the W&B Inference pricing page or try a popular model like DeepSeek R1 through the W&B Weave Playground.
Add a comment
Tags: Articles
Iterate on AI agents and models faster. Try Weights & Biases today.