eleutherai

Top-k Distillation

preetham-gali

2021-09-17

4 years ago

Comparing runtime

preetham-gali

2021-08-17

4 years ago

Autoregressive Distillation

All models are EleutherAI's GPT-NeoX models, loosely based on Megatron LM and GPT-3. Models are named after the number of non-embedding params. Models named "X to Y" are distillations of a model of size X into a model of size Y. All models are trained on the Pile with Rotary Embeddings.

stellaathena

2021-08-02

4 years ago