Skip to main content
Reports
Created by
Created On
Last edited
[RWKV-infctx] DeepSpeed 2 / 3 comparisons
The following compares deepspeed 2 & 3 (with / without CPU offloading) running on a single 2 x A5000 (with NvLink, 24GB vram each, via vast.ai). With all other training params kept consistently the same (including dataset, 1 epoch, enwiki_10k)
0
2023-07-10