MLKV Small-Scale Experiments
I tried my best to make them all have the same param count by scaling the FFNN multiplier. It's hard to get it exactly the same
small-llama-poetry: vanilla MHA; 64 heads; 85,074,432 params
small-llama-gqa-16-poetry: GQA; 16 heads; 85,099,008 params
small-llama-mlkv-16-poetry: MLKV; 16 heads; 85,099,008 params
small-llama-mqa-poetry: MQA; 8 heads; 85,078,528 params
small-llama-mlkv-lin-poetry: MLKV; 2 heads; 85,078,528 params
small-llama-mlkv-poetry: MLKV with FFNN KV heads; 2 heads; 85,074,432 params
79 views
Last edit 1 year ago