Transfomers for translation
Created on October 2|Last edited on October 2
Comment
Effect of number of layers
- Clearly num of layers alone did not determine BLEU score.
Effect of number of heads
- Higher number of heads was more important to increased performance than, for example number of layers
Effect of dropout
- As long as dropout is not zero it works.
Best lr and optimizer
- Numbers around 1e-3 and 1e-4 were most optimal.
Parameter importance
Overall
Run set
106
- Overall Adam with low lr rate and 6-6 of heads and layers gives good performance.
Add a comment