Skip to main content
Reports
Created by
Created On
Last edited
Code generation and debugging with the Grok 4 API
Grok 4 tutorial: A step-by-step guide to using xAI’s Grok 4 for code generation and debugging, including OpenRouter setup and W&B Weave integration.
0
2025-07-10
Tutorial: Running inference with Kimi K2 using W&B Inference
Getting set up and running Kimi K2, MoonShot AI's advanced long-context language model, in Python using W&B Inference. We'll be working with the moonshotai/Kimi-K2-Instruct model.
1
2025-08-11
Anthropic acquires Humanloop. Your alternative is Weights & Biases.
Humanloop is shutting down. Meet your alternative: Weights & Biases. Discover how to seamlessly migrate your LLM workflows for tracking, evaluation, and monitoring.
0
2025-08-14
A guide to LLM debugging, tracing, and monitoring
Learn how to debug, trace, and monitor your LLM applications with W&B Weave. Gain visibility into performance, errors, and safety for reliable AI.
0
2025-08-11
Exploring LLM evaluations and benchmarking
Explore LLM evaluation and benchmarking essentials - ethical, factual, and performance assessments - with a step-by-step W&B Weave tutorial for AI leads.
0
2025-08-11
Integrating W&B Inference with Claude Code: A step-by-step guide
Save 70-80% on AI coding costs by integrating Claude Code with W&B Inference. Complete guide using official APIs - no complex proxy setup required.
1
2025-07-28
Amazon Bedrock AgentCore observability guide
Learn how Amazon Bedrock AgentCore enables secure, scalable AI agent deployment with built‑in observability—and how integrating W&B Weave enhances traceability and performance insights for developers.
3
2025-07-28
Using the Gemini embedding model to Develop a RAG System with observability via W&B Weave
Build a powerful RAG system with Gemini embeddings (gemini-embedding-001) and monitor queries, retrievals, and outputs using W&B Weave.
0
2025-07-25
Tutorial: The OpenPipe ART project
Deep dive article about OpenPipe ART and training agents.
1
2025-07-23
Tutorial: MUVERA + Weights & Biases = Fast, scalable multi-vector retrieval
Learn how to implement MUVERA with Weights & Biases to build fast, scalable multi-vector retrieval systems. This hands-on tutorial covers theory, code, and performance tracking.
0
2025-07-22
Tracing your CrewAI application
Trace your CrewAI application with W&B Weave: Apply guardrails, visualize every agent decision, and debug multi-agent workflows for better performance.
0
2025-07-07
Tutorial: Kimi K2 for code generation with observability
Kimi K2 is Moonshot AI's 1T-parameter open model excelling in coding and reasoning. Explore its features, API deployment, and monitoring with W&B Weave.
1
2025-07-14
1
2025-07-09
NVIDIA GB300 NVL72: A leap in AI performance and efficiency
The NVIDIA GB300 NVL72 is delivered as a fully liquid-cooled, rack-scale system (CoreWeave's first GB300 rack shown above). Its dense design unifies cutting-edge GPUs, CPUs, and networking to push AI performance to new heights.
0
2025-07-04
What is LLM observability?
Deploying LLM applications into production is complex. This guide explains LLM observability - why it matters, common failure modes like hallucinations, key tool features, and how to get started with W&B Weave.
0
2025-06-17
NVIDIA Blackwell GPU architecture: Unleashing next‑gen AI performance
Blackwell GPU: NVIDIA’s next-gen architecture with multi-die design, FP4 precision, NVLink-5, and GB200 Superchip powering unparalleled AI training and real-time inference.
1
2025-05-16
LLM observability: Enhancing AI systems with W&B Weave
Explore the essentials of LLM observability, key challenges, and how tools like W&B Weave help AI teams monitor, debug, and optimize large language models for performance, reliability, and ethical compliance.
0
2025-05-08
LLM evaluation metrics: A comprehensive guide for large language models
Learn how to evaluate Large Language Models (LLMs) effectively. This guide covers automatic & human-aligned metrics (BLEU, ROUGE, factuality, toxicity), RAG, code generation, and W&B Guardrail examples.
1
2025-05-03
LLM evaluation: Metrics, frameworks, and best practices
A comprehensive guide on LLM evaluation, exploring key metrics, human and automated methods of evaluation, best practices, and how to leverage W&B Weave for continuous improvement.
0
2025-02-12
AI agent evaluation: Metrics, strategies, and best practices
Evaluate your AI agents effectively with a comprehensive guide on key metrics, evaluation strategies, and a beginner-friendly W&B Weave tutorial.
1
2025-04-17