RULER

Automated reward function for reinforcement fine-tuning

Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on W&B Training Serverless RL.

Screenshot 2025-10-07 at 9.04.07 AM

Why RULER?

2-3x faster development

Skip reward function engineering entirely. Reduce implementation time by 2-3x compared to hand-crafted rewards.

General-purpose

Works across any task without modification. Apply it to a wide range of RL tasks with a single line of code.

No labeled data required​

RULER compares trajectories against each other with an LLM-as-judge. No manual labeling or synthetic data needed.

# Before: Hours of reward engineering
def complex_reward_function(trajectory):
    # 50+ lines of careful scoring logic...
    pass


# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")

Getting started

Simply pip install openpipe-art and try our sample notebook example on W&B Training Serverless RL. It takes just one line of code to use RULER. You can also head over to the RULER documentation to learn more.

Get started with Serverless RL