ppo_fast vs orz ppo + orz verifier
Created on May 8|Last edited on May 8
Comment
Below is a comparison between oi's ppo_fast.py and Open Reasoner Zero's ppo. Note orz's runtime appears slower partly because it runs in-loop eval, whereas our ppo_fast.py saves the checkpoint and launches other jobs for eval
orz PPO
1
ppo_fast
1
Add a comment