Skip to main content

Compare Parameters Trained

a
Created on May 18|Last edited on May 18

Compare Parameters Trained

With pythia-70m, 8k new Traditional Chinese tokens added, compare training only embed, embed + attention, or all parameters.
  • Green: Train only embed (gpt_neox.embed_in.weight, embed_out.weight). trainable params: 59507712 || all params: 78423040 || trainable%: 75.88039433309395
  • Purple: Train embed + attention (all above + gpt_neox.layers.n.post_attention_layernorm.weight, gpt_neox.layers.n.post_attention_layernorm.bias, gpt_neox.layers.n.attention.query_key_value.weight, gpt_neox.layers.n.attention.query_key_value.bias, gpt_neox.layers.n.attention.dense.weight, gpt_neox.layers.n.attention.dense.bias). trainable params: 65817600 || all params: 78423040 || trainable%: 83.92635633609716
  • Yellow: Train all parameters. trainable params: 78423040 || all params: 78423040 || trainable%: 100.0


Run set
3