Skip to main content
marin-community
Projects
marin
Reports
Olmo 7B Replication
Log in
Sign up
Share
Comment
Star
Olmo 7B Replication
Our attempt at replicating olmo 7b. We used the olmo tokenizer and dolma 1.7. Our architecture was almost identical except we use RMSNorm rather than LayerNorm (both versions do not learn either a bias term or a gain term on the layer norm)
David Leo Wright Hall
Created on July 15
|
Last edited on May 12
Comment
Section 1
log(train/loss) vs tokens
log(train/loss) vs tokens
Select runs that logged train/loss
to visualize data in this line chart.
Run set
3
Add a comment