Skip to main content

R1-Zero-like training for math

Created on April 3|Last edited on April 27

Big-Math-RL-Verified experiments

Phase 1: stabilise training

Phase 2: tune for performance

DAPO Math 17k experiments


Dr GRPO
4
Overlong filtering
2
Num iterations
4
Prompt batch size
3
Num tokens
2
Ref model
2
Fast generation
5