TinyZero-R1-Countdown Qilong Wu
Reproduce R1 ~ RL boosts the reasoning ability of LLM and enable the 'aha' moment testing in countdown task.
Created on January 28|Last edited on January 28
Comment
1. Actor
Run set
5
2. Critic
Run set
5
3. Response Length
Run set
5
4. Generation Speed
Run set
5
5. Training Speed
Run set
5
6. GPU System
Run set
5
Add a comment