Skip to main content

Olmo 7B Replication

Our attempt at replicating olmo 7b. We used the olmo tokenizer and dolma 1.7. Our architecture was almost identical except we use RMSNorm rather than LayerNorm (both versions do not learn either a bias term or a gain term on the layer norm)
Created on July 15|Last edited on May 12

Section 1


Select runs that logged train/loss
to visualize data in this line chart.
Run set
3