Fanbinlu's workspace
Runs
3
State
Notes
User
Tags
Created
Runtime
Sweep
algorithm.adv_estimator
algorithm.disable_kl
algorithm.enable_replay
algorithm.gamma
algorithm.kl_coef
algorithm.kl_horizon
algorithm.kl_penalty
algorithm.kl_target
algorithm.kl_type
algorithm.lam
algorithm.use_kl_loss
data.answer_key
data.format_prompt
data.image_key
data.max_pixels
data.max_prompt_length
data.max_response_length
data.min_pixels
data.prompt_key
data.rollout_batch_size
data.seed
data.shuffle
data.train_files
data.val_batch_size
data.val_files
env.max_steps
env.num_envs
env.screen_size
trainer.critic_warmup
trainer.experiment_name
trainer.load_checkpoint_path
trainer.logger
trainer.n_gpus_per_node
trainer.nnodes
trainer.project_name
trainer.save_checkpoint_path
trainer.save_freq
trainer.save_limit
trainer.total_episodes
trainer.val_before_train
trainer.val_freq
trainer.val_generations_to_log
trainer.val_only
worker.actor.clip_ratio_dual
Finished
fanbinlu
3s
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
Crashed
fanbinlu
21h 49m 45s
-
grpo
true
false
1
0
0
low_var_kl
0
fixed
1
true
answer
You are helpful assistant.
images
2116800
64000
8192
256
problem
16
1
true
data/evaluation_examples/test_success_uitars1.5_wo_impossible.json
-1
data/evaluation_examples/test_success_uitars1.5_wo_impossible.json
15
128
[1920,1080]
0
osworld_cot_7b_nokl_0516_twonodes_fixlogin_onlinereplay_resume88
./checkpoints/easy_r1/osworld_cot_7b_nokl_0516_twonodes_fixlogin_onlinereplay/global_step_88/
["console","wandb"]
8
2
easy_r1
checkpoints/easy_r1/osworld_cot_7b_nokl_0516_twonodes_fixlogin_onlinereplay_resume88
8
3
25
true
8
3
false
3
Finished
fanbinlu
12h 44m 59s
-
grpo
true
true
1
0
0
low_var_kl
0
fixed
1
true
answer
You are helpful assistant.
images
2116800
64000
8192
256
problem
8
1
true
data/evaluation_examples/test_success_middle_difficult.json
-1
data/evaluation_examples/test_success_middle_difficult.json
15
64
[1920,1080]
0
osworld_cot_7b_nokl_0510_twonodes_subset32_middle_withreplay_test2
-
["console","wandb"]
8
2
easy_r1
checkpoints/easy_r1/osworld_cot_7b_nokl_0510_twonodes_subset32_middle_withreplay_test2
5
3
15
true
5
3
false
3
1-3
of 3