Hi all, I am working on ResNet101 with cross entropy loss loss.backward() give me this error when I switch on fp16 precision. I can not used fp32 because my GPU and memory resources are limit.
I am confused about this error. I tried some optimization settings but they did not work.
I would be really appreciated to any advice.
Thanks
Note that invalid gradients are expected at the beginning of the training in amp
using float16
as well as sometimes during the training. In these cases the GradScaler
will skip the parameter updates and reduce the loss scaling factor. If you are using anomaly detection from the beginning, you might need to disable it.