Skip to main content
Reports
Created by
Created On
Last edited
MLKV Small-Scale Experiments
I tried my best to make them all have the same param count by scaling the FFNN multiplier. It's hard to get it exactly the same small-llama-poetry: vanilla MHA; 64 heads; 85,074,432 params small-llama-gqa-16-poetry: GQA; 16 heads; 85,099,008 params small-llama-mlkv-16-poetry: MLKV; 16 heads; 85,099,008 params small-llama-mqa-poetry: MQA; 8 heads; 85,078,528 params small-llama-mlkv-lin-poetry: MLKV; 2 heads; 85,078,528 params small-llama-mlkv-poetry: MLKV with FFNN KV heads; 2 heads; 85,074,432 params
0
2023-11-10