Why does some operators need to be float 32

amsword · June 8, 2021, 4:57pm

In mixed precision calculation, some operators can be fp16, while some can not. The question is that why those operator can not be fp16, e.g. layer norm. One rule could be that, do the model training by turning it to fp16. If the performance is not as good, then it should be fp32. But this test could be very expensive. Is there a way to figure out whether an operator should be fp32 or fp16 by some rules?