Skip to main content

Transfomers for translation

Created on October 2|Last edited on October 2

Effect of number of layers


23456789num_layers0.080.10.120.140.160.18BLEU

  • Clearly num of layers alone did not determine BLEU score.

Effect of number of heads



  • Higher number of heads was more important to increased performance than, for example number of layers

Effect of dropout



  • As long as dropout is not zero it works.

Best lr and optimizer



  • Numbers around 1e-3 and 1e-4 were most optimal.


Parameter importance




Overall


Run set
106

  • Overall Adam with low lr rate and 6-6 of heads and layers gives good performance.