For more information or if you need help retrieving your data, please contact Weights & Biases Customer Support at support@wandb.com
Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on W&B Training Serverless RL.
2-3x faster development
Skip reward function engineering entirely. Reduce implementation time by 2-3x compared to hand-crafted rewards.
General-purpose
Works across any task without modification. Apply it to a wide range of RL tasks with a single line of code.
No labeled data required
RULER compares trajectories against each other with an LLM-as-judge. No manual labeling or synthetic data needed.
# Before: Hours of reward engineering
def complex_reward_function(trajectory):
# 50+ lines of careful scoring logic...
pass
# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")Simply pip install openpipe-art and try our sample notebook example on W&B Training Serverless RL. It takes just one line of code to use RULER. You can also head over to the RULER documentation to learn more.