Reports
Created by
Created On
Last edited
Autoregressive Distillation
All models are EleutherAI's GPT-NeoX models, loosely based on Megatron LM and GPT-3. Models are named after the number of non-embedding params. Models named "X to Y" are distillations of a model of size X into a model of size Y.
All models are trained on the Pile with Rotary Embeddings.
2
2021-08-02