Skip to main content
Reports
Created by
Created On
Last edited
0
2021-09-17
0
2021-08-17
Autoregressive Distillation
All models are EleutherAI's GPT-NeoX models, loosely based on Megatron LM and GPT-3. Models are named after the number of non-embedding params. Models named "X to Y" are distillations of a model of size X into a model of size Y. All models are trained on the Pile with Rotary Embeddings.
2
2021-08-02