6M Parameter models
A selection of 6M parameter, GPT-J, models with varied architectures, trained on architectural design data. These are a part of a larger scaling laws experiment with models ranging from 2M to 2B parameters.
Created on April 5|Last edited on April 5
Comment
Add a comment