RULER

Automated reward function for reinforcement fine-tuning

Relative Universal LLM-Elicited Rewards (RULER) is a general-purpose reward function for reinforcement learning (RL) that uses an LLM-as-judge to rank multiple agent trajectories. It requires no labeled data, expert feedback, or handcrafted reward functions, yet reliably improves agent performance—matching or exceeding handcrafted rewards on 3 of 4 benchmarks. Define your task in the system prompt and RULER handles the rest. RULER is part of the Agent Reinforcement Trainer (ART) open-source framework and is now available on W&B Training Serverless RL.

image6

Why ART?

Built for real-world agents

Real user interactions are multi-turn. ART supports multi-turn rollouts so your agent learns from realistic conversations and performs reliably.

Drop‑in integration

OpenAI‑compatible chat endpoint slots straight into your existing code or frameworks like CrewAI, OpenAI Agents SDK, and LangGraph.

Works flexibly with existing code

ART provides wrappers to plug RL training into existing apps and abstracts the training server into a modular service your code needn’t touch.

# Before: Hours of reward engineering
def complex_reward_function(trajectory):
    # 50+ lines of careful scoring logic...
    pass


# After: One line with RULER
judged_group = await ruler_score_group(group, "openai/o3")

Getting started

Simply pip install openpipe-art and try our sample notebook example [UPDATE LINK] on W&B Training Serverless RL. It takes just one line of code to use RULER. Head over to the RULER documentation to learn more

The Weights & Biases end-to-end AI developer platform

Weave

Traces

Debug agents and AI applications

Evaluations

Rigorous evaluations of agentic AI systems

Playground

Explore prompts
and models

Agents

Observability tools for agentic systems

Guardrails

Block prompt attacks and harmful outputs

Monitors

Continuously improve in prod

Models

Experiments

Track and visualize your ML experiments

Sweeps

Optimize your hyperparameters

Tables

Visualize and explore your ML data

Core

Inference 

Explore hosted, open-source LLMs

Registry

Publish and share your AI models and datasets

Artifacts

Version and manage your AI pipelines

Reports

Document and share your AI insights

SDK

Log AI experiments and artifacts at scale

Automations

Trigger workflows automatically

The Weights & Biases platform helps you streamline your workflow from end to end

Models

Experiments

Track and visualize your ML experiments

Sweeps

Optimize your hyperparameters

Registry

Publish and share your ML models and datasets

Automations

Trigger workflows automatically

Weave

Traces

Explore and
debug LLMs

Evaluations

Rigorous evaluations of GenAI applications

Core

Artifacts

Version and manage your ML pipelines

Tables

Visualize and explore your ML data

Reports

Document and share your ML insights

SDK

Log ML experiments and artifacts at scale

Get started with Guardrails