Chenmientan's workspace
Runs
1
Name
1 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
actor.gradient_checkpointing
actor.lora.alpha
actor.lora.dropout
actor.lora.rank
actor.lora.target_modules
actor.lr
actor.max_grad_norm
actor.max_length_per_device
actor.model_name
actor.offload_model
actor.offload_optimizer
actor.ref_model_name
actor.save_dir
actor.save_optimizer
actor.sp_size
actor.warmup_ratio
actor.weight_decay
data.batch_size
data.max_length
data.path
trainer.alpha
trainer.beta
trainer.disable_wandb
trainer.experiment_name
trainer.n_epochs
trainer.project
accuray
grad_norm
loss
rewards/chosen
rewards/margin
rewards/rejected
Finished
-
chenmientan
29m 24s
-
true
16
0
0
all-linear
5.0000e-7
1
8192
allenai/Llama-3.1-Tulu-3-8B-SFT
false
false
allenai/Llama-3.1-Tulu-3-8B-SFT
ckpts/tulu-3-8b
true
1
0.1
0.01
128
1024
Chenmien/UltraFeedback
0
0.1
false
tulu-3-8b
1
UltraFeedback
0.62025
35.43274
0.41071
-0.37245
0.38284
-0.755
1-1
of 1