Skip to main content

Scaling (across multiple nodes)

General scheme is mp = 2 (across nvlnk bridges), pp=n_gpus/2 (across all other connections), mb_size=16, g.a.s = 64. This may not be the ideal setup but should give us a general idea of the architecture's scalability. As we can see, the nvlink pairs topology appears to scale sub-linearly. It appears from the logs that the bottleneck seems to become the pipeline parallel connections as we scale.
Created on March 22|Last edited on March 22

Flops/s/GPU


16B / 24 GPUs12B / 18 GPUs8B / 12 GPUs4B / 6 GPUs010,000,000,000,00020,000,000,000,00030,000,000,000,00040,000,000,000,00050,000,000,000,00060,000,000,000,00070,000,000,000,00080,000,000,000,000
Run set
60



Run set
60



Run set
60