Does pytorch use tensor cores by default?

indramal · December 6, 2022, 10:50am

does PyTorch use tensor cores by default when using the train?

when the model inserts to Cuda cores using following code.

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") 
model = model.to(device)

eqy · December 6, 2022, 6:39pm

It would depend on the GPU, operations and data types being used. For Volta: fp16 should use tensor cores by default for common ops like matmul and conv. For Ampere and newer, fp16, bf16 should use tensor cores for common ops and fp32 for convs (via TF32). You can also enable tensor cores for fp32 matmuls on Ampere and newer via: torch.set_float32_matmul_precision — PyTorch 1.13 documentation but it is not a default.

indramal · December 6, 2022, 9:32pm

You are correct but I need to confirm it. If not use tensor cores in common ops, is there any method to do it. That is what I need to know. .is_cuda method tells whether it is CUDA or nor. I think pytorch indicates it both CUDA cores and Tensor cores. So there is no option to check about tensor cores.

When writing code for automat mixed precision, also we don’t know it use CUDA cores or Tensor cores.

suraj.pt · December 6, 2022, 10:17pm

No it doesn’t use it by default in v1.12 or later. You have to set

# The flag below controls whether to allow TF32 on matmul. This flag defaults to False
# in PyTorch 1.12 and later.
torch.backends.cuda.matmul.allow_tf32 = True

# The flag below controls whether to allow TF32 on cuDNN. This flag defaults to True.
torch.backends.cudnn.allow_tf32 = True

See: CUDA semantics — PyTorch 1.13 documentation

indramal · December 7, 2022, 1:00pm

I enable both in A100 GPU but speed up

eqy · December 7, 2022, 5:17pm

Could you post a minimal version of the code that you are trying to speed up?