Verify scaling batch size widens the gap between Muon & AdamW
Created on September 7|Last edited on September 7
Comment
eval/paloma/c4_en/loss
eval/paloma/c4_en/loss
train/loss
train/loss
Run set
4646
Name
10 visualized
State
Notes
User
Tags
Created
Runtime
Sweep
data.cache_options.batch_size
data.cache_options.num_shard_groups
data.cache_options.target_size_per_flush
data.configs.dclm_baseline.cache_dir
data.configs.dclm_baseline.format.text_key
data.configs.dclm_baseline.plaintext
data.configs.dclm_baseline.stream
data.configs.dclm_baseline.tags
data.configs.dclm_baseline.text_key
data.configs.dclm_baseline.train_urls
data.configs.dclm_baseline.validation_urls
data.configs.fineweb-edu-10B.cache_dir
data.configs.fineweb-edu-10B.format.text_key
data.configs.fineweb-edu-10B.tags
data.configs.fineweb-edu-10B.train_urls
data.configs.fineweb-edu-10B.validation_urls
data.configs.paloma/4chan.cache_dir
data.configs.paloma/4chan.format.text_key
data.configs.paloma/4chan.plaintext
data.configs.paloma/4chan.stream
data.configs.paloma/4chan.tags
data.configs.paloma/4chan.text_key
data.configs.paloma/4chan.train_urls
data.configs.paloma/4chan.validation_urls
data.configs.paloma/c4_100_domains.cache_dir
data.configs.paloma/c4_100_domains.format.text_key
data.configs.paloma/c4_100_domains.plaintext
data.configs.paloma/c4_100_domains.stream
data.configs.paloma/c4_100_domains.tags
data.configs.paloma/c4_100_domains.text_key
data.configs.paloma/c4_100_domains.train_urls
data.configs.paloma/c4_100_domains.validation_urls
data.configs.paloma/c4_en.cache_dir
data.configs.paloma/c4_en.format.text_key
data.configs.paloma/c4_en.plaintext
data.configs.paloma/c4_en.stream
data.configs.paloma/c4_en.tags
data.configs.paloma/c4_en.text_key
data.configs.paloma/c4_en.train_urls
data.configs.paloma/c4_en.validation_urls
data.configs.paloma/dolma-v1_5.cache_dir
data.configs.paloma/dolma-v1_5.format.text_key
data.configs.paloma/dolma-v1_5.plaintext
data.configs.paloma/dolma-v1_5.stream
Finished
-
when
speedrun
22m 40s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
43m 34s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
51m 24s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
55m 19s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
39m 22s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
40m 54s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
59m 1s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
1h 10m 21s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
10h 41m 36s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
10h 49m 11s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
1-10
of 10
Add a comment
Created with ❤️ on Weights & Biases.
https://wandb.ai/marin-community/optimizer-scaling/reports/Verify-scaling-batch-size-widens-the-gap-between-Muon-AdamW--VmlldzoxNDI5MjAzMw