Skip to main content

Pythia 1B Trained Parameters Compare

Created on May 18|Last edited on May 18

Compare Trained Parameters

With pythia-1b, 8k new Traditional Chinese tokens were added to the tokenizer. Compare training only embed, training embed + attention, or training all parameters.
  • Yellow: Train only embed (embed_in.weight, embed_out.weight).
  • Blue: Train embed + attention (all above + layers.n.post_attention_layernorm.weight, layers.n.post_attention_layernorm.bias, layers.n.attention.query_key_value.weight, layers.n.attention.query_key_value.bias, layers.n.attention.dense.weight, layers.n.attention.dense.bias).
  • Purple: Train all parameters.
| | Trainable Params | All Params | Trainable% |
|-------------------|------------------|---------------|------------|
| embed only | 238,030,848 | 1,043,767,296 | 22.8049728 |
| embed + attention | 506,662,912 | 1,043,767,296 | 48.5417500 |
| all params | 1,043,767,296 | 1,043,767,296 | 100.0 |

Run set
3