Tensor Cores and mixed precision *matrix multiplication* - output in float32

Thanks K. Frank,

sorry, i might have been not clear. I’m aware of trade-off, but thought that nvidia returns result in fp32 (as per link above)

. So i’d guess (might be wrongly?) it’s just Pytorch which doesn’t support returning fp32 result