make long_range dataset work
I want to just tinker with what I have so far to just get the model to begin to really show useful results on this dataset. The build of it really feels like it inherently proves what I'm seeking to prove about the model.
Created on August 6|Last edited on August 6
Comment
Let's look at an example input/output from the dataset:
8596 715750 0.07% (1645m 17s) 1.2082 avg: 1.15430 Sequence:8597 00000??00!?008598 000000?000?0 ✓8599 716000 0.07% (1645m 50s) 1.0711 avg: 1.15474 Sequence:8600 000?,00!,000008601 0000000000000 ✓8602 716250 0.07% (1646m 23s) 1.0552 avg: 1.15159 Sequence:8603 000?80!8008604 000000000 ✓
The dataset is a sequence of characters where a ? is followed by a character that will certainly be repeated later, marked by a preceding !. A random number of 0s fill everywhere else.
vary plast_learning_rate, the rate at which plasticity parameters update. This is using a newer initialization method I've experimented with that tries to start with values that I consider to be more desirable. It's got a plast_clip of 5e3 and lr of 1e-4, so it may cut it close as far as explosion goes.
Indeed, we can see that the plast_learning_rate of 1e-2 (0.01) explodes early.
Run set
4
Static
Ok, concurrently, let's try to get it running from a static plasticity scenario, again. In the past, lr of even as low as 1e-5 diverge, much more if larger. Even despite having learnable plasticity, weirdly enough. Let's go lower, then, but scaling the initialized plasticities in different ways, as well.
Run set
8
Add a comment