Skip to main content

Resume from checkpoint

Created on July 29|Last edited on July 29

20k40k60k80k100k120k140ktrain/step5.15.125.145.165.185.25.22
Run set
1

We resume from that checkpoint using this PR that tries to correctly load the optimizer.
When resuming from the checkpoint, I would expect eval/loss to continue decreasing on its trend but it suddenly increases for a little while.
Here are a few runs with different warmup_steps.

Run set
5