Using Nvidia tensor core for float mm computation

Hongzhang_Shan · July 29, 2020, 3:06am

On Nvidia V100, cuda supports to compute the float32 mm computation using tensor cores instead of FP32 cores. Does pytorch support this ? I mean, using tensor core to compute torch.mm for float32 dat type.

ngimel · July 30, 2020, 1:47am

On v100 cuda (cublas) does not support using tensor cores instead of fp32 cores on fp32 data, so pytorch does not support it either.

Hongzhang_Shan · July 30, 2020, 7:54am

I read through https://docs.nvidia.com/cuda/cublas/index.html and found
CUBLAS_COMPUTE_32F_FAST_16F is used for
cublasGemmEx and cublasLtMatmul,
“Allows the library to use Tensor Cores with automatic down-conversion and 16-bit half-precision compute for 32-bit input and output matrices.”

Thought the conversion is supported. Maybe I am wrong ? Thanks.