It would depend on the GPU, operations and data types being used. For Volta: fp16 should use tensor cores by default for common ops like matmul and conv. For Ampere and newer, fp16, bf16 should use tensor cores for common ops and fp32 for convs (via TF32). You can also enable tensor cores for fp32 matmuls on Ampere and newer via: torch.set_float32_matmul_precision — PyTorch 1.13 documentation but it is not a default.