Skip to main content

Partial Rotary Tests v2

Results for rotary embeddings applied to only part of q/k. dim per head = 64 Pink - Learned Abs Baseline Brown - Rotary applied to 25% (16/64) Green - Rotary applied to 50% (32/64) Blue - Rotary applied to 100% (64/64) Other Pink - Rotary applied to 25% (16/64) every other layer
Created on April 19|Last edited on April 19




1k1.2k1.4k1.6k1.8k2k2.2k2.4kStep34
pos emb: rotary rot pct 0.25 group: rot0.25_halflayersPrT6TJyCnw9C4uq2G4MaCN
pos emb: rotary rot pct 0.25 group: 5GgHaj4vFYMgJh9NHWLaeY
pos emb: learned rot pct 1 group: KucbxTiGBHAi6mF4YpXP3G
pos emb: rotary rot pct 1 group: MjJQugGhST6wpoMFpLuvcT
pos emb: rotary rot pct 0.5 group: e4qzi98rTcc9d5VaASCa7u
5001k1.5k2k2.5kStep2345678910
pos emb: rotary rot pct 0.25 group: rot0.25_halflayersPrT6TJyCnw9C4uq2G4MaCN
pos emb: rotary rot pct 0.25 group: 5GgHaj4vFYMgJh9NHWLaeY
pos emb: learned rot pct 1 group: KucbxTiGBHAi6mF4YpXP3G
pos emb: rotary rot pct 1 group: MjJQugGhST6wpoMFpLuvcT
pos emb: rotary rot pct 0.5 group: e4qzi98rTcc9d5VaASCa7u
Run set
5