Basic NDT3 pilots
Created on August 23|Last edited on August 23
Comment
### Basic questions
- Can we, using a handcoded normalization (just dividing by 0.01), produce similar R2 as full blown zscore (which has 0.003-something)?
- Yes, pretty much (compare rollback_parity_2 which is NDT2 codebase vs base_obs which has a hardcoded 0.01 norm).
2. Does encoding covariates help/change modeling?
- Looks like a huge difference, cf base_obs and enc_obs. Now this doesn't imply actual change in modeling quality, just overfit to open loop trajs. I guess models aren't so good at doing closed loop stereotyped trajectories without some autoregressive modeling.
3. Does enc improve neural prediction depend on setting being closed loop? (Since open loop mostly predictable, mostly autonomous)
- TBD
exp/ndt3_pilots
Run set
5
4. (Left) Does expanding to grasp data work nontrivially?
Yes, grasp R2 > 0 by a decent margin once we include it in training.
5. (Right) Does expanding loss to many dims, if they are useless, work? (Is blacklist dims necessary or are dead streams ok?)
The more dimensions there are, the slower the learning and the worse the peak performance. (Here, we're reading all dimensions from the same token).
This is likely a capacity issue, solvable by tokenizing behavior dims.
This effect remains even if we're encoding covariates. (Good, makes sense...)
Adding more dimensions
Run set
6
Ecoding Effect on neural data
- Maybe like, light evidence that CL models achieve better shuffle than NDT2 base models. (cf enc_cl vs base_cl). No clear evidence comparing base_full_cl and enc_full_cl though I'm surprised it
Run set
7
Add a comment