Hi,
in the performance guide (Performance Tuning Guide β PyTorch Tutorials 2.1.1+cu121 documentation), it says:
To use Tensor Cores: set sizes to multiples of 8 (to map onto dimensions of Tensor Cores)
Does this mean when I have a tensor BCHW with (32,15,10,256), operations on this tensor within the autocast() context manager will not be mapped at all to tensor cores, because the C and H dimensions are not multiples of 8? In my application the dimension H=10 is a fixed parameter that must not be changed. Does that mean, I wonβt be able to utilize tensor cores, at all?
Furthermore, I also saw in the performance guide that the channels_last memory format may be preferable to work with tensor cores. So, following the guide (Performance Tuning Guide β PyTorch Tutorials 2.1.1+cu121 documentation), I convert my existing model:
model = model.to(device='cuda', memory_format=torch.channels_last)
and in the training loop:
for i,(X,y) in enumerate(dataloader):
X = X.to(device='cuda', memory_format=torch.channels_last, non_blocking=True)
with autocast():
out = model(X)
# ...
Is that about right to make the model tensor core compatible?
Thanks!