I want to write a custom layer using CUDA. However, It fails when I use NVIDIA-Apex to train the model with mixed-precision. What should I do? Is there any example of FP16 cuda layer?
You could use
AT_DISPATCH_FLOATING_TYPES_AND_HALF to dispatch the code for the
float16 type and use
scalar_t in the code (similar to e.g. this code).
Also note, that we recommend to use the native mixed-precision training utility via
torch.cuda.amp instead of
These codes may help.