RULER
Automated reward function for reinforcement fine-tuning
Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on W&B Training Serverless RL.

Why ART?
Built for real-world agents
Real user interactions are multi-turn. ART supports multi-turn rollouts so your agent learns from realistic conversations and performs reliably.
Drop‑in integration
OpenAI‑compatible chat endpoint slots straight into your existing code or frameworks like CrewAI, OpenAI Agents SDK, and LangGraph.
Works flexibly with existing code
ART provides wrappers to plug RL training into existing apps and abstracts the training server into a modular service your code needn’t touch.
# Before: Hours of reward engineering
def complex_reward_function(trajectory):
# 50+ lines of careful scoring logic...
pass
# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")
Getting started
Simply pip install openpipe-art and try our sample notebook example [UPDATE LINK] on W&B Training Serverless RL. It takes just one line of code to use RULER. Head over to the RULER documentation to learn more
The Weights & Biases end-to-end AI developer platform
Weave
Models
The Weights & Biases platform helps you streamline your workflow from end to end
Models
Experiments
Track and visualize your ML experiments
Sweeps
Optimize your hyperparameters
Registry
Publish and share your ML models and datasets
Automations
Trigger workflows automatically
Weave
Traces
Explore and
debug LLMs
Evaluations
Rigorous evaluations of GenAI applications