R1-Zero-like training for math
Created on April 3|Last edited on April 27
Comment
Big-Math-RL-Verified experiments
- Experiments run on a preprocessed version of the SynthLabsAI/Big-Math-RL-Verifieddataset.
Phase 1: stabilise training
Phase 2: tune for performance
DAPO Math 17k experiments
4
2
4
3
2
2
Fast generation
5
Add a comment