Skip to main content

NLB Maze Med pushing

Created on March 22|Last edited on March 22

examining exact co-bps regressions

Nothing is grossly bad anymore (grossly being ref_9), which was mainly excessive dropout. We need to be quite careful here because our finetuning is decidedly not producing much improvement (check eval)
Still some minor diffs, where ref_12 is SoTA - perfs listed are to best val.
  • ref_14e:
    • -0.06 eval, -0.09 val.
    • Zero mask, f128, 4 enc 2 dec, dropout=0.8 (override dropout io)
  • ref_14d (same as ref_12):
    • -0.05 eval, -0.11 val, 0 enc 6 dec, dropout=0.8/0.6
  • ref_15g:
    • -0.08 eval, -0.09 val, f32, 4 enc 2 dec, dropout=0.6
  • All quite valid. In sum no reason to believe enc-dec asym to matter.
  • Thus the changes that survive are simply: nonlearned position, zero mask, dropout override.
    • None of these should matter for a pretrained model (the position + mask embed are frozen during tuning).

Run set
10