Subclassed Tensor vs torch.Tensor GPU Throughput