Pythia 1B Trained Parameters Compare
Created on May 18|Last edited on May 18
Comment
Compare Trained Parameters
With pythia-1b, 8k new Traditional Chinese tokens were added to the tokenizer. Compare training only embed, training embed + attention, or training all parameters.
- Yellow: Train only embed (embed_in.weight, embed_out.weight).
- Blue: Train embed + attention (all above + layers.n.post_attention_layernorm.weight, layers.n.post_attention_layernorm.bias, layers.n.attention.query_key_value.weight, layers.n.attention.query_key_value.bias, layers.n.attention.dense.weight, layers.n.attention.dense.bias).
- Purple: Train all parameters.
| | Trainable Params | All Params | Trainable% ||-------------------|------------------|---------------|------------|| embed only | 238,030,848 | 1,043,767,296 | 22.8049728 || embed + attention | 506,662,912 | 1,043,767,296 | 48.5417500 || all params | 1,043,767,296 | 1,043,767,296 | 100.0 |
Run set
3
Add a comment