Comment
objective/scores
objective/scores
exp_name: train_policy_accelerate Run set 2
exp_name: train_policy_accelerate, ppo.gradient_accumulation_steps: 1, base_model: EleutherAI/pythia-160m Run set
exp_name: train_policy_accelerate, ppo.gradient_accumulation_steps: 1, base_model: cerebras/Cerebras-GPT-111M Run set
exp_name: train_policy_accelerate, ppo.gradient_accumulation_steps: 64, base_model: gpt2 Run set
exp_name: train_policy_accelerate, ppo.gradient_accumulation_steps: 1, base_model: gpt2 Run set
objective/kl
objective/kl
system/gpu.0.gpu
system/gpu.0.gpu
system/gpu.0.memoryAllocated
system/gpu.0.memoryAllocated
Run set
19
Run set 2
5
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/openrlbenchmark/lm_human_preference_details/reports/different-base-models--Vmlldzo1Mzg3NzY4