For more information or if you need help retrieving your data, please contact Weights & Biases Customer Support at support@wandb.com
Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on Serverless RL.
2-3x faster development
Skip reward function engineering entirely. Reduce implementation time by 2-3x compared to hand-crafted rewards.
General-purpose
Works across any task without modification. Apply it to a wide range of RL tasks with a single line of code.
No labeled data required
RULER compares trajectories against each other with an LLM-as-judge. No manual labeling or synthetic data needed.
# Before: Hours of reward engineering
def complex_reward_function(trajectory):
# 50+ lines of careful scoring logic...
pass
# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")
Simply pip install openpipe-art and try our sample notebook example on Serverless RL. It takes just one line of code to use RULER. You can also head over to the RULER documentation to learn more.