Comment
Section 1
val/loss_pile
val/loss_pile
train/loss
train/loss
train/last_loss
train/last_loss
_step
_step
Run set
3
State
Notes
User
Tags
Created
Runtime
Sweep
anneal_steps
bucket
ckpt_every
comment
cores_per_replica
d_model
end_lr
gradient_accumulation_steps
layers
lr
model_dir
n_heads
n_vocab
name
per_replica_batch
seq
total_steps
tpu_name
tpu_size
tpus_per_replica
train_set
val_batches
val_every
val_set
warmup_steps
weight_decay
keep_every
norm
optimizer
pe
val_set.owt
val_set.pile
bf16_optimizer
d_head
early_collect
eos_token_id
eval_harness_tasks
mask_token_id
mlm_probability
pe_rotary_dims
Killed
kindiana
19h 35m 48s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_rotary
16
50400
GPT3_XL_pile_rotary
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
-
pile.val.index
-
-
-
-
-
-
-
-
Killed
kindiana
23h 17m 27s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_fixed
16
50400
GPT3_XL_pile_fixed
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
fixed
-
pile.val.index
-
-
-
-
-
-
-
-
Killed
kindiana
8h 53m 8s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_t5
16
50400
GPT3_XL_pile_t5
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
t5
openwebtext2_new_inputs.val.index
pile.val.index
-
-
-
-
-
-
-
-
1-3
of 3
Run set
3
Run set
3
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/eleutherai/mesh-transformer-jax/reports/Position-encoding-shootout--Vmlldzo2MTg2MzY