On Nvidia V100, cuda supports to compute the float32 mm computation using tensor cores instead of FP32 cores. Does pytorch support this ? I mean, using tensor core to compute torch.mm for float32 dat type.
On v100 cuda (cublas) does not support using tensor cores instead of fp32 cores on fp32 data, so pytorch does not support it either.
I read through https://docs.nvidia.com/cuda/cublas/index.html and found
CUBLAS_COMPUTE_32F_FAST_16F is used for
cublasGemmEx and cublasLtMatmul,
“Allows the library to use Tensor Cores with automatic down-conversion and 16-bit half-precision compute for 32-bit input and output matrices.”
Thought the conversion is supported. Maybe I am wrong ? Thanks.