RuntimeError: Function 'ToCopyBackward0' returned nan values in its 0th output

Hi all, I am working on ResNet101 with cross entropy loss loss.backward() give me this error when I switch on fp16 precision. I can not used fp32 because my GPU and memory resources are limit.
I am confused about this error. I tried some optimization settings but they did not work.
I would be really appreciated to any advice.

Note that invalid gradients are expected at the beginning of the training in amp using float16 as well as sometimes during the training. In these cases the GradScaler will skip the parameter updates and reduce the loss scaling factor. If you are using anomaly detection from the beginning, you might need to disable it.