Partial Rotary Tests v2
Results for rotary embeddings applied to only part of q/k.
dim per head = 64
Pink - Learned Abs Baseline
Brown - Rotary applied to 25% (16/64)
Green - Rotary applied to 50% (32/64)
Blue - Rotary applied to 100% (64/64)
Other Pink - Rotary applied to 25% (16/64) every other layer
Created on April 19|Last edited on April 19
Comment
Run set
5
Add a comment