pjit script results comparable with pmap script

Comparing a training run of t5-base training with adafactor by two different scripts. Notes: * Green: pjit script, blue: pmap script * The pjit script does not average training loss between each log, so appears more jagged * The pmap script is slightly faster. Maybe due to a suboptimal model partitioning definition.

Yep

Created on February 19|Last edited on February 19

Comment

﻿
﻿
train/loss
train/loss
10k20k30ktrain/step345678910203040506070
copper-sweep-3
woven-sweep-1
train/loss
train/loss
50100150Time (minutes)345678910203040506070
copper-sweep-3
woven-sweep-1
Run set2
﻿
﻿
﻿
Run set2
﻿
﻿

Add a comment