Skip to main content

pjit script results comparable with pmap script

Comparing a training run of t5-base training with adafactor by two different scripts. Notes: * Green: pjit script, blue: pmap script * The pjit script does not average training loss between each log, so appears more jagged * The pmap script is slightly faster. Maybe due to a suboptimal model partitioning definition.
Created on February 19|Last edited on February 19

10k20k30ktrain/step345678910203040506070
50100150Time (minutes)345678910203040506070
Run set
2



Run set
2