Skip to main content

MLKV Small-Scale Experiments

I tried my best to make them all have the same param count by scaling the FFNN multiplier. It's hard to get it exactly the same small-llama-poetry: vanilla MHA; 64 heads; 85,074,432 params small-llama-gqa-16-poetry: GQA; 16 heads; 85,099,008 params small-llama-mlkv-16-poetry: MLKV; 16 heads; 85,099,008 params small-llama-mqa-poetry: MQA; 8 heads; 85,078,528 params small-llama-mlkv-lin-poetry: MLKV; 2 heads; 85,078,528 params small-llama-mlkv-poetry: MLKV with FFNN KV heads; 2 heads; 85,074,432 params
Created on November 10|Last edited on November 10

Section 1


020M40M60M80Mtokens100150200250300
020M40M60M80Mtokens100150200250300
Run set
6