Tensor format to utilize tensor cores?


in the performance guide (Performance Tuning Guide β€” PyTorch Tutorials 1.11.0+cu102 documentation), it says:

To use Tensor Cores: set sizes to multiples of 8 (to map onto dimensions of Tensor Cores)

Does this mean when I have a tensor BCHW with (32,15,10,256), operations on this tensor within the autocast() context manager will not be mapped at all to tensor cores, because the C and H dimensions are not multiples of 8? In my application the dimension H=10 is a fixed parameter that must not be changed. Does that mean, I won’t be able to utilize tensor cores, at all?

Furthermore, I also saw in the performance guide that the channels_last memory format may be preferable to work with tensor cores. So, following the guide (Performance Tuning Guide β€” PyTorch Tutorials 1.11.0+cu102 documentation), I convert my existing model:

model = model.to(device='cuda', memory_format=torch.channels_last)

and in the training loop:

for i,(X,y) in enumerate(dataloader): 
     X = X.to(device='cuda', memory_format=torch.channels_last, non_blocking=True)
     with autocast(): 
            out = model(X)
     # ...

Is that about right to make the model tensor core compatible?


Not necessarily as e.g. cuDNN could pad your inputs and use TensorCores internally.

Yes, the channels-last usage looks correct.

1 Like