Skip to main content

NLB Control

Created on March 23|Last edited on March 23
Since we were seeing unsuccessful transfer on NLB co-bps, I took a step back and ran experiments more similar to section 1, arch/base pilots.

At the outset, it seemed like transfer could in fact be worse than pretraining, but tweaking the Heldout task to leverage the core enc-dec infill path fixed this (all models, including basic transfer ~SoTA). (Not shown in plots, but the relevant exps are also in nlb_control)

Co-bps probe

  • Pretraining appears to help for RTT, but not for Maze. (Pretraining on maze is minor)
    • It is undetermined whether this is an issue of transfer from or to maze datasets (2K might simply not be enough)
    • Note these runs are pretty comparable to the SoTA baselines (i.e. heavy regularization evidently didn't really matter)
      • Though if we're talking about 0.02 cobps diffs being signif, maybe this is important...

Run set
15



Kinematic decode probe

  • When accounting for val as well, large scale pretraining pulls ahead, and even maze sees pretraining benefit.
  • Note optimal schedule is different than for co-bps; kinematic prefers faster schedules?

Run set
15