Skip to main content

Olmo 7B Replication

Our attempt at replicating olmo 7b. We used the olmo tokenizer and dolma 1.7. Our architecture was almost identical except we use RMSNorm rather than LayerNorm (both versions do not learn either a bias term or a gain term on the layer norm)
Created on July 15|Last edited on May 12

Section 1


This set of panels contains runs from a private project, which cannot be shown in this report