Tradeshow

NeurIPS Conference 2025

Schedule an in person meeting or demo

About the event

Weights & Biases is heading to NeurIPS 2025 to connect with the brightest minds shaping the future of AI. Drop by our booth to explore how teams are not just building and evaluating models—but deploying, governing and optimizing next-generation systems with Weights & Biases. From fine-tuning massive LLMs to orchestrating agentic workflows, managing synthetic-data pipelines and driving model lifecycle observability, we’d love to talk about how we can help accelerate your work in this new era.

About our session

Tuesday (12/02) 2:30pm – Measuring Emergent Behavior in AI Agents

As language models transition into agents, they exhibit behaviors that were not explicitly trained—emergent dynamics that are powerful yet poorly understood. Measuring these behaviors requires dedicated tooling that treats evaluation as a central research problem rather than a peripheral task. This talk introduces frameworks for self-improving agents that generate candidate variants, run structured experiments, and incorporate evaluation feedback into iterative refinement. Such loops operationalize the scientific method in software, enabling agents to improve through cycles of hypothesis, measurement, and revision. Tooling for evaluation plays a critical role in this process, transforming measurement from a diagnostic exercise into an engine for discovery. Early experiments reveal both hidden failure modes and novel capabilities, underscoring the need to build for emergence as an active research objective. The talk concludes by outlining a research agenda in which evaluation frameworks provide the substrate for cultivating reliable, trustworthy agent systems.

Request a meeting onsite

NeurIPS Conference 2025

About the event

About our session

The Platform

Article

Resources

Company

Use cases

Industries

Learn more

NeurIPS Conference 2025

About the event

About our session

Speakers

The Platform

Article

Resources

Company

Use cases

Industries