RULER

Automated reward function for reinforcement fine-tuning

Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on W&B Training Serverless RL.

Why RULER?

2-3x faster development

Skip reward function engineering entirely. Reduce implementation time by 2-3x compared to hand-crafted rewards.

General-purpose

Works across any task without modification. Apply it to a wide range of RL tasks with a single line of code.

No labeled data required

RULER compares trajectories against each other with an LLM-as-judge. No manual labeling or synthetic data needed.

# Before: Hours of reward engineering
def complex_reward_function(trajectory):
    # 50+ lines of careful scoring logic...
    pass


# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")

Getting started

Simply pip install openpipe-art and try our sample notebook example on W&B Training Serverless RL. It takes just one line of code to use RULER. You can also head over to the RULER documentation to learn more.

RULER

Automated reward function for reinforcement fine-tuning

Why RULER?

Getting started

Get started with Serverless RL

The Platform

Article

Resources

Company

Learn more

RULER

Automated reward function for reinforcement fine-tuning

Why RULER?

Getting started

Get started with Serverless RL

The Platform

Article

Resources

Company