Skip to main content

ppo_fast vs orz ppo + orz verifier

Created on May 8|Last edited on May 8
Below is a comparison between oi's ppo_fast.py and Open Reasoner Zero's ppo. Note orz's runtime appears slower partly because it runs in-loop eval, whereas our ppo_fast.py saves the checkpoint and launches other jobs for eval


50100150Time (hours)0.20.40.6
50100150Time (hours)0.020.040.060.08
50100150Time (hours)1000200030004000
2M4M6Mglobal_step02000400060008000
2M4M6Mglobal_step02000400060008000
2M4M6Mglobal_step02000400060008000
2M4M6Mglobal_step02000400060008000
50100150Time (hours)500100015002000250030003500
50100150Time (hours)50010001500200025003000
20406080Time (hours)200400600800
orz PPO
1
ppo_fast
1