Chenmientan's workspace
Runs
1
Name
1 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
actor.clip
actor.entropy.coef
actor.freeze_steps
actor.gradient_checkpointing
actor.kl.coef
actor.kl.loss_estimator
actor.kl.reward_estimator
actor.lr
actor.max_grad_norm
actor.max_length_per_device
actor.model_name
actor.offload_model
actor.offload_optimizer
actor.ref_model_name
actor.rollout.env_path
actor.rollout.gpu_memory_utilization
actor.rollout.group_filtering.lower
actor.rollout.group_filtering.upper
actor.rollout.n_turns
actor.rollout.test_sampling_params.max_new_tokens
actor.rollout.test_sampling_params.temperature
actor.rollout.tp_size
actor.rollout.train_sampling_params.max_new_tokens
actor.rollout.train_sampling_params.temperature
actor.save_dir
actor.save_optimizer
actor.sp_size
actor.update_per_rollout
actor.weight_decay
adv.estimator
adv.gamma
adv.lamda
adv.norm_var
critic.clip
critic.gradient_checkpointing
critic.lr
critic.max_grad_norm
critic.max_length_per_device
critic.model_name
critic.offload_model
critic.offload_optimizer
critic.save_dir
critic.save_optimizer
critic.sp_size
Crashed
-
chenmientan
16h 35m 45s
-
0.2
0.0001
0
true
0
k2
k1
0.000001
1
8192
Qwen/Qwen2.5-3B-Instruct
true
true
Qwen/Qwen2.5-3B-Instruct
envs/searchr1.py
0.5
0
1
4
512
0
1
512
1
ckpts/qwen2.5-3b-inst_reinforce/actor
true
1
1
0.01
reinforce
1
1
false
0.5
true
0.000005
1
8192
Qwen/Qwen2.5-3B-Instruct
true
true
ckpts/qwen2.5-3b-inst_reinforce/critic
true
1
1-1
of 1