Skip to main content
openrlbenchmark
Projects
lm_human_preference_details
Reports
refactor
Log in
Sign up
Share
Comment
Star
Share
Comment
Star
refactor
Costa
Created on September 22
|
Last edited on September 22
Comment
objective/scores
objective/scores
500
1k
1.5k
global_step
0
1
2
exp_name: train_policy_accelerate, ppo.gradient_accumulation_steps: 1, base_model: gpt2
Run set
exp_name: train_policy_accelerate
Run set 2
objective/kl
objective/kl
500
1k
1.5k
global_step
0
2
4
6
8
10
system/gpu.0.gpu
system/gpu.0.gpu
20
40
60
80
100
Time (minutes)
20
40
60
80
system/gpu.0.memoryAllocated
system/gpu.0.memoryAllocated
20
40
60
80
100
Time (minutes)
14
16
18
20
22
Run set
5
Run set 2
4
Add a comment