Tensor format to utilize tensor cores?

jayz · May 30, 2022, 10:15am

Hi,

in the performance guide (Performance Tuning Guide — PyTorch Tutorials 2.1.1+cu121 documentation), it says:

To use Tensor Cores: set sizes to multiples of 8 (to map onto dimensions of Tensor Cores)

Does this mean when I have a tensor BCHW with (32,15,10,256), operations on this tensor within the autocast() context manager will not be mapped at all to tensor cores, because the C and H dimensions are not multiples of 8? In my application the dimension H=10 is a fixed parameter that must not be changed. Does that mean, I won’t be able to utilize tensor cores, at all?

Furthermore, I also saw in the performance guide that the channels_last memory format may be preferable to work with tensor cores. So, following the guide (Performance Tuning Guide — PyTorch Tutorials 2.1.1+cu121 documentation), I convert my existing model:

model = model.to(device='cuda', memory_format=torch.channels_last)

and in the training loop:

for i,(X,y) in enumerate(dataloader): 
     X = X.to(device='cuda', memory_format=torch.channels_last, non_blocking=True)
     with autocast(): 
            out = model(X)
     # ...

Is that about right to make the model tensor core compatible?

Thanks!

ptrblck · May 30, 2022, 7:30pm

Not necessarily as e.g. cuDNN could pad your inputs and use TensorCores internally.

Yes, the channels-last usage looks correct.