Andreaskoepf's workspace
Runs
2
State
Notes
User
Tags
Created
Runtime
Sweep
actor_init_on_gpu
actor_learning_rate
adam_betas
adam_offload
advantage_estimator
apply_chat_template
aux_loss_coef
bf16
ckpt_path
critic_learning_rate
critic_pretrain
disable_ds_ckpt
disable_fast_tokenizer
enable_ema
eps_clip
eval_steps
flash_attn
freezing_actor_steps
gamma
generate_max_len
gradient_checkpointing
gradient_checkpointing_use_reentrant
init_kl_coef
input_key
l2
lambd
load_checkpoint
load_in_4bit
local_rank
logging_steps
lora_alpha
lora_dropout
lora_rank
lr_warmup_ratio
max_ckpt_mem
max_ckpt_num
max_epochs
max_norm
max_samples
micro_rollout_batch_size
micro_train_batch_size
n_samples_per_prompt
normalize_reward
num_episodes
Finished
-
andreaskoepf
22h 39m 5s
-
false
5.0000e-7
[0.9,0.95]
true
gae
true
0
true
./ckpt/checkpoints_ppo
0.000009
meta-llama/Llama-3.2-3B-Instruct
false
false
false
0.2
-1
true
-1
1
1024
true
false
0.01
question
0
1
false
false
0
1
16
0
0
0.03
100000000
3
1
1
100000
4
2
1
true
1
Crashed
-
andreaskoepf
47m 5s
-
false
5.0000e-7
[0.9,0.95]
true
gae
true
0
true
./ckpt/checkpoints_ppo
0.000009
meta-llama/Llama-3.2-1B-Instruct
false
false
false
0.2
-1
true
-1
1
1024
true
false
0.01
question
0
1
false
false
0
1
16
0
0
0.03
100000000
3
1
1
100000
4
2
1
true
1
1-2
of 2