Skip to main content
costa-huang
Projects
cleanRL
Reports
reward model training
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
reward model training
Costa
Created on June 16
|
Last edited on July 7
Comment
loss, train_reward/minibatch/loss
loss, train_reward/minibatch/loss
0
20
40
60
80
100
120
140
Step
1
1.2
1.4
1.6
1.8
loss, train_reward/minibatch/loss
loss, train_reward/minibatch/loss
100
150
200
250
300
350
400
Time (seconds)
1
1.2
1.4
1.6
1.8
my attempts
4
openai original codebase
40
before refactor
10
my attempts
1
openai original codebase
41
Add a comment