Chenmientan's workspace
Runs
2
State
Notes
User
Tags
Created
Runtime
Sweep
actor.avg_level
actor.clip
actor.ddp_size
actor.entropy.coef
actor.freeze_steps
actor.gradient_checkpointing
actor.kl.coef
actor.kl.loss_estimator
actor.kl.reward_estimator
actor.lr
actor.max_grad_norm
actor.max_inference_length_per_device
actor.max_length_per_device
actor.model_name
actor.offload_model
actor.offload_optimizer
actor.scheduler
actor.sp_size
actor.temperature
actor.tis_coef
actor.tp_size
actor.update_per_rollout
actor.use_liger_kernel
actor.warmup_ratio
actor.weight_decay
adv.estimator
adv.gamma
adv.global_norm
adv.lamda
adv.norm_var
critic.avg_level
critic.clip
critic.ddp_size
critic.gradient_checkpointing
critic.lr
critic.max_grad_norm
critic.max_inference_length_per_device
critic.max_length_per_device
critic.model_name
critic.offload_model
critic.offload_optimizer
critic.scheduler
critic.sp_size
critic.tp_size
Crashed
-
chenmientan
2h 35m 17s
-
token
0.2
1
0
0
true
0
k2
k1
0.000001
1
8192
8192
Qwen/Qwen3-1.7B-Base
true
true
constant
1
1
2
1
1
false
0.1
0.01
gae
1
true
1
true
token
0.5
1
true
0.000005
1
8192
8192
Qwen/Qwen3-1.7B-Base
true
true
constant
1
1
Crashed
-
chenmientan
2h 42m 16s
-
token
0.2
1
0
0
true
0
k2
k1
0.000001
1
8192
8192
Qwen/Qwen3-1.7B-Base
true
true
constant
1
1
2
1
1
false
0.1
0.01
reinforce
1
true
1
true
token
0.5
1
true
0.000005
1
8192
8192
Qwen/Qwen3-1.7B-Base
true
true
constant
1
1
1-2
of 2