Skip to main content

reward model training

Created on June 16|Last edited on July 7


020406080100120140Step11.21.41.61.8
100150200250300350400Time (seconds)11.21.41.61.8
my attempts
4
openai original codebase
40
before refactor
10



my attempts
1
openai original codebase
41