Skip to main content

Verify scaling batch size widens the gap between Muon & AdamW

Created on September 7|Last edited on September 7

02k4k6k8kStep3.83.853.93.954
02k4k6k8kStep4681012
Run set
4646
State
Notes
User
Tags
Created
Runtime
Sweep
data.cache_options.batch_size
data.cache_options.num_shard_groups
data.cache_options.target_size_per_flush
data.configs.dclm_baseline.cache_dir
data.configs.dclm_baseline.format.text_key
data.configs.dclm_baseline.plaintext
data.configs.dclm_baseline.stream
data.configs.dclm_baseline.tags
data.configs.dclm_baseline.text_key
data.configs.dclm_baseline.train_urls
data.configs.dclm_baseline.validation_urls
data.configs.fineweb-edu-10B.cache_dir
data.configs.fineweb-edu-10B.format.text_key
data.configs.fineweb-edu-10B.tags
data.configs.fineweb-edu-10B.train_urls
data.configs.fineweb-edu-10B.validation_urls
data.configs.paloma/4chan.cache_dir
data.configs.paloma/4chan.format.text_key
data.configs.paloma/4chan.plaintext
data.configs.paloma/4chan.stream
data.configs.paloma/4chan.tags
data.configs.paloma/4chan.text_key
data.configs.paloma/4chan.train_urls
data.configs.paloma/4chan.validation_urls
data.configs.paloma/c4_100_domains.cache_dir
data.configs.paloma/c4_100_domains.format.text_key
data.configs.paloma/c4_100_domains.plaintext
data.configs.paloma/c4_100_domains.stream
data.configs.paloma/c4_100_domains.tags
data.configs.paloma/c4_100_domains.text_key
data.configs.paloma/c4_100_domains.train_urls
data.configs.paloma/c4_100_domains.validation_urls
data.configs.paloma/c4_en.cache_dir
data.configs.paloma/c4_en.format.text_key
data.configs.paloma/c4_en.plaintext
data.configs.paloma/c4_en.stream
data.configs.paloma/c4_en.tags
data.configs.paloma/c4_en.text_key
data.configs.paloma/c4_en.train_urls
data.configs.paloma/c4_en.validation_urls
data.configs.paloma/dolma-v1_5.cache_dir
data.configs.paloma/dolma-v1_5.format.text_key
data.configs.paloma/dolma-v1_5.plaintext
data.configs.paloma/dolma-v1_5.stream
Finished
-
when
speedrun
22m 40s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
43m 34s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
51m 24s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
55m 19s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
39m 22s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
40m 54s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
59m 1s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
1h 10m 21s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
10h 41m 36s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
Finished
-
when
speedrun
10h 49m 11s
-
128
128
512MB
-
-
-
-
-
-
-
-
text
[]
["dummy.jsonl"]
[]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/4chan_meta_sep/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_100_domains/val/val*.jsonl.gz"]
text
-
-
[]
-
[]
["gs://marin-us-central2/raw/paloma-speedrun-1d28d7/c4_en/val/val*.jsonl.gz"]
text
-
-
1-10
of 10


This report includes the experiments in https://github.com/marin-community/marin/pull/1558.