Skip to main content

Subclassed Tensor vs torch.Tensor GPU Throughput

Created on June 10|Last edited on June 10
This report shows a consistent decrease in GPU Throughput between training with a torch.Tensor or a subclassed tensor defined below.
class SubClassedTensor(torch.Tensor):
pass
All runs used a torchvision ResNet50, 224px image size, a batch size of 64, and mixed precision. The script for training can be found here.

Volta V100


020406080Step20040060080010001200Images/Second
SubClassedTensor
torch.Tensor
Run set
6


Ampere 3080 Ti


Run set
10


Ampere 3080 Ti: Channels Last


Run set
10