pjit script results comparable with pmap script
Comparing a training run of t5-base training with adafactor by two different scripts. Notes:
* Green: pjit script, blue: pmap script
* The pjit script does not average training loss between each log, so appears more jagged
* The pmap script is slightly faster. Maybe due to a suboptimal model partitioning definition.
Created on February 19|Last edited on February 19
Comment
Run set
2
Run set
2
Add a comment