Drscotthawley's workspace
1
The challenge has been to get the zsum and zmix to equal each other for points "on the fringe"
Adding batch norm -- either before or after activations -- resulted in noticeably higher mix loss values, and significantly higher variance loss values, though also significantly lower covariance values
A little bit (10%) of relative loss doesn't change things by much at all.
recon loss is pretty fine no matter what i do
batch size doesn't seem to help, more neurons in network doesn't seem to help.
additional "vanilla" cosine annealing (with warm restarts) after 1cycle doesn't seem to help.
Dec 31
could try keeping stems=2 and only doing var & cov on zsum and zmix, not all the stems.
*** YES! This works fine
Jan 1
trying a different nonlinear model now.
- Devil's advocate: Model isn't learning nonlinearity at all: just the middle points that are linear
- Counterpoint to that: but look at the straight lines in the bottom right graphs