Annavettoruzzo's group workspace
Group: flame-moe-290m
Name
1 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
account_for_embedding_in_pipeline_split
account_for_loss_in_pipeline_split
accumulate_allreduce_grads_in_fp32
adam_beta1
adam_beta2
adam_eps
add_bias_linear
add_position_embedding
add_qkv_bias
adlr_autoresume
adlr_autoresume_interval
align_grad_reduce
align_param_gather
app_tag_run_version
apply_layernorm_1p
apply_query_key_layer_scaling
apply_residual_connection_post_layernorm
apply_rope_fusion
async_tensor_model_parallel_allreduce
attention_backend
attention_dropout
attention_softmax_in_fp32
auto_detect_ckpt_format
barrier_with_L1_time
bert_binary_head
bert_embedder_type
bf16
bias_dropout_fusion
bias_gelu_fusion
bias_swiglu_fusion
biencoder_projection_dim
biencoder_shared_query_context_model
calc_ft_timeouts
calculate_per_token_loss
check_for_large_grads
check_for_nan_in_loss_and_grad
check_for_spiky_loss
ckpt_assume_constant_structure
ckpt_convert_update_legacy_dist_opt_format
ckpt_format
ckpt_fully_parallel_load
ckpt_fully_parallel_save
ckpt_fully_parallel_save_deprecated
classes_fraction
Finished
Add notes...
haok
1h 59m 41s
-
false
false
true
0.9
0.999
1.0000e-8
false
true
false
false
1000
true
false
0.0.0
false
false
false
true
true
auto
0
false
false
true
true
megatron
true
true
false
true
0
false
false
false
false
true
false
false
false
torch_dist
false
true
false
1
1-1
of 1