Mhardik003's workspace
Runs
4
State
Notes
User
Tags
Created
Runtime
Sweep
_name_or_path
activation
activation_function
adafactor
adam_beta1
adam_beta2
adam_epsilon
add_cross_attention
afn
alibi
apply_residual_connection_post_layernorm
architectures
attention_dropout
attention_probs_dropout_prob
attn_pdrop
auto_find_batch_size
bf16
bf16_full_eval
bias
bos_token_id
chunk_size_feed_forward
classifier_dropout
classifier_dropout_prob
d_ff
d_kv
d_model
dataloader_drop_last
dataloader_num_workers
dataloader_pin_memory
ddp_timeout
debug
decoder_start_token_id
dense_act_fn
dim
disable_tqdm
diversity_penalty
do_eval
do_predict
do_sample
do_train
down_scale_factor
dropout
dropout_rate
early_stopping
Finished
-
mhardik003
1h 48m 26s
-
distilbert-base-uncased
gelu
-
false
0.9
0.999
1.0000e-8
false
-
-
-
["DistilBertForMaskedLM"]
0.1
-
-
false
false
false
-
-
0
-
-
-
-
-
false
0
true
1800
[]
-
-
768
false
0
true
false
false
false
-
0.1
-
false
Finished
-
mhardik003
1h 46m 54s
-
distilbert-base-uncased
gelu
-
false
0.9
0.999
1.0000e-8
false
-
-
-
["DistilBertForMaskedLM"]
0.1
-
-
false
false
false
-
-
0
-
-
-
-
-
false
0
true
1800
[]
-
-
768
false
0
true
false
false
false
-
0.1
-
false
Finished
-
mhardik003
3h 26m 28s
-
albert-base-v2
-
-
false
0.9
0.999
1.0000e-8
false
-
-
-
["AlbertForMaskedLM"]
-
0
-
false
false
false
-
2
0
-
0.1
-
-
-
false
0
true
1800
[]
-
-
-
false
0
true
false
false
false
1
-
-
false
Finished
-
mhardik003
3h 29m 2s
-
roberta-base
-
-
false
0.9
0.999
1.0000e-8
false
-
-
-
["RobertaForMaskedLM"]
-
0.1
-
false
false
false
-
0
0
-
-
-
-
-
false
0
true
1800
[]
-
-
-
false
0
true
false
false
false
-
-
-
false
1-4
of 4