Rotary Test 3
150M param model on OWT2 with learned embeddings (blue) vs. rotary embeddings (green) vs. rpe (brown) vs. rpe with caching (peach)
Created on April 15|Last edited on April 15
Comment
Section 1
Run set
4
Run set
4
Add a comment