Using the default code, no FP16 has been observed and tensor unit is idle according to the profiler’s report.
I would like to know what should I do in order to use them? All FP instructions are 32-bit.
To use TensorCores, you would have to apply some operations in FP16. You could transform (some) operations manually via .half() or I would recommend to use our mixed-precision approach in apex.
We are currently working on a direct PyTorch integration.