Skip to main content

TinyZero-R1-Countdown Qilong Wu

Reproduce R1 ~ RL boosts the reasoning ability of LLM and enable the 'aha' moment testing in countdown task.
Created on January 28|Last edited on January 28

1. Actor


50100150200250Step20406080
50100150200250Step-0.500.511.522.5
50100150200250Step0.20.40.60.811.21.4
50100150200250Step00.0020.0040.0060.008
50100150200250Step0.050.10.150.20.25
50100150200250Step-0.000500.00050.0010.00150.002
Run set
5


2. Critic


Run set
5


3. Response Length


Run set
5


4. Generation Speed


Run set
5


5. Training Speed


Run set
5


6. GPU System


Run set
5