NLB Maze Med pushing
Created on March 22|Last edited on March 22
Comment
examining exact co-bps regressions
Nothing is grossly bad anymore (grossly being ref_9), which was mainly excessive dropout. We need to be quite careful here because our finetuning is decidedly not producing much improvement (check eval)
Still some minor diffs, where ref_12 is SoTA - perfs listed are to best val.
- ref_14e:
- -0.06 eval, -0.09 val.
- Zero mask, f128, 4 enc 2 dec, dropout=0.8 (override dropout io)
- ref_14d (same as ref_12):
- -0.05 eval, -0.11 val, 0 enc 6 dec, dropout=0.8/0.6
- ref_15g:
- -0.08 eval, -0.09 val, f32, 4 enc 2 dec, dropout=0.6
- All quite valid. In sum no reason to believe enc-dec asym to matter.
- Thus the changes that survive are simply: nonlearned position, zero mask, dropout override.
- None of these should matter for a pretrained model (the position + mask embed are frozen during tuning).
Run set
10
Add a comment