Plot: Dolma 1.7 vs Dolma-OLMoE
Created on August 4|Last edited on August 29
Comment
Validation loss on C4, The Pile etc. is expected to be worse for our new data (news), as Dolma 1.7 contains the C4 training set and other training sets similar to those in The Pile such as Reddit, and thus has a more similar distribution to those.
Run set
2
Run set 2
Add a comment