Mikasenghaas's workspace
Runs
833
Name
0 visualized
Notes
eval/perplexity/average (Min)
DiLoCo-SWARM (GPT-2 Large)
29.17067
Baseline (GPT-2 Large)
30.18166
DiLoCo-SWARM (GPT-2 Medium)
25.66126
Baseline (GPT-2 Medium)
31.5672
DiLoCo-SWARM (GPT-2 Tiny)
109.54656
Baseline (GPT-2 Tiny)
133.2024
Baseline (GPT-2 Small, 8000 steps)
24.36893
DiLoCo-SWARM
31.2675
DiLoCo-SWARM
28.61161
DiLoCo-SWARM
2516.41168
DiLoCo-SWARM
30.48685
DiLoCo-SWARM
27.95415
DiLoCo-SWARM
30.14847
Baseline (SWARM)
30.22404
Baseline (DiLoCo)
29.25856
Baseline (DDP)
28.53477
Baseline (Single GPU)
37.09231
1-17
of 17Text and code
1
Links
Find experiment-specific views here:
- Experiment 1: DiLoCo-SWARM
- Experiment 2: Ablation on communication frequency
- Experiment 3: Ablation on model size
To pick up a draggable item, press the space bar.
While dragging, use the arrow keys to move the item.
Press space again to drop the item in its new position, or press escape to cancel.