Skip to main content

griffin aka recurrent_gemma arch

some initial experiments with activation, layer count on simple_wikipedia_LM
Created on April 25|Last edited on April 25
example output model can be found here

Section 1


0.511.5train/epoch5678910
0.511.5train/epoch0.0090.010.020.030.040.050.060.070.080.090.10.20.30.4
0.511.5train/epoch1234
Run set
4



Run set
4



Run set
4



Run set
4