Skip to main content

Position encoding shootout

Created on April 18|Last edited on August 25

Section 1


1k2k3k4k5k6k7k8k9k10k20kStep23
1k2k3k4k5k6k7k8k9k10k20kStep2345
1k2k3k4k5k6k7k8k9k10k20kStep2345
20k40k60k80kStep05000100001500020000
Run set
3
State
Notes
User
Tags
Created
Runtime
Sweep
anneal_steps
bucket
ckpt_every
comment
cores_per_replica
d_model
end_lr
gradient_accumulation_steps
layers
lr
model_dir
n_heads
n_vocab
name
per_replica_batch
seq
total_steps
tpu_name
tpu_size
tpus_per_replica
train_set
val_batches
val_every
val_set
warmup_steps
weight_decay
keep_every
norm
optimizer
pe
val_set.owt
val_set.pile
bf16_optimizer
d_head
early_collect
eos_token_id
eval_harness_tasks
mask_token_id
mlm_probability
pe_rotary_dims
Killed
kindiana
19h 35m 48s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_rotary
16
50400
GPT3_XL_pile_rotary
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
rotary
-
pile.val.index
-
-
-
-
-
-
-
-
Killed
kindiana
23h 17m 27s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_fixed
16
50400
GPT3_XL_pile_fixed
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
fixed
-
pile.val.index
-
-
-
-
-
-
-
-
Killed
kindiana
8h 53m 8s
-
100000
neo-models
1000
Trying to replicate a MTF 1.3B run for use on a V3-256
2
2048
0.00002
8
24
0.0002
pile_xl_t5
16
50400
GPT3_XL_pile_t5
1
2048
100000
-
128
-
pile.train.index
100
1000
-
1000
0.1
100000
layernorm-desync
["optax._src.combine.chain.<locals>.init_fn","optax._src.combine.chain.<locals>.update_fn"]
t5
openwebtext2_new_inputs.val.index
pile.val.index
-
-
-
-
-
-
-
-
1-3
of 3



Run set
3



Run set
3