Scaling (across multiple nodes)
General scheme is mp = 2 (across nvlnk bridges), pp=n_gpus/2 (across all other connections), mb_size=16, g.a.s = 64. This may not be the ideal setup but should give us a general idea of the architecture's scalability. As we can see, the nvlink pairs topology appears to scale sub-linearly. It appears from the logs that the bottleneck seems to become the pipeline parallel connections as we scale.
Created on March 22|Last edited on March 22
Comment
Flops/s/GPU
Run set
60
Run set
60
Run set
60
Add a comment