in the performance guide (Performance Tuning Guide — PyTorch Tutorials 1.11.0+cu102 documentation), it says:
To use Tensor Cores: set sizes to multiples of 8 (to map onto dimensions of Tensor Cores)
Does this mean when I have a tensor BCHW with (32,15,10,256), operations on this tensor within the autocast() context manager will not be mapped at all to tensor cores, because the C and H dimensions are not multiples of 8? In my application the dimension H=10 is a fixed parameter that must not be changed. Does that mean, I won’t be able to utilize tensor cores, at all?
Furthermore, I also saw in the performance guide that the channels_last memory format may be preferable to work with tensor cores. So, following the guide (Performance Tuning Guide — PyTorch Tutorials 1.11.0+cu102 documentation), I convert my existing model:
model = model.to(device='cuda', memory_format=torch.channels_last)
and in the training loop:
for i,(X,y) in enumerate(dataloader): X = X.to(device='cuda', memory_format=torch.channels_last, non_blocking=True) with autocast(): out = model(X) # ...
Is that about right to make the model tensor core compatible?