Compare Parameters Trained
a
Created on May 18|Last edited on May 18
Comment
Compare Parameters Trained
With pythia-70m, 8k new Traditional Chinese tokens added, compare training only embed, embed + attention, or all parameters.
- Green: Train only embed (gpt_neox.embed_in.weight, embed_out.weight). trainable params: 59507712 || all params: 78423040 || trainable%: 75.88039433309395
- Purple: Train embed + attention (all above + gpt_neox.layers.n.post_attention_layernorm.weight, gpt_neox.layers.n.post_attention_layernorm.bias, gpt_neox.layers.n.attention.query_key_value.weight, gpt_neox.layers.n.attention.query_key_value.bias, gpt_neox.layers.n.attention.dense.weight, gpt_neox.layers.n.attention.dense.bias). trainable params: 65817600 || all params: 78423040 || trainable%: 83.92635633609716
- Yellow: Train all parameters. trainable params: 78423040 || all params: 78423040 || trainable%: 100.0
Run set
3
Add a comment